EFFORT AND SELECTION EFFECTS OF PERFORMANCE

PAY IN KNOWLEDGE CREATION∗

† Erina Ytsma

It is well-documented that performance pay has positive effort and selection effects in routine, easy to measure tasks, but its effect in knowledge creation is much less understood. This paper studies the effects of explicit and implicit, market-based incentives commonly found in knowledge work industries in a multi-tasking model and estimates the causal effort and selection effects of performance incentives in knowledge creation by exploiting the introduction of performance pay in German academia as a natural experiment, and using a newly constructed dataset of the universe of German academics. I find that performance incentives attract more productive academics, and research quantity increases by 14 to 18%, but there is no increase in the highest quality output.

JEL J33, M52, O31

1 INTRODUCTION

Knowledge work is an important pillar of present-day economies. It has become rapidly more prevalent over the last four decades and exhibited consistent growth in occupational employment share1(Autor, 2019). Furthermore, knowledge creation has long been considered an important driver of economic growth (Romer, 1986; Lucas, 1988). Yet much is still unclear about how to motivate knowledge workers, including how they respond to performance pay. This paper sheds

∗I would like to thank Laurence Ales, Pierre Azoulay, Oriana Bandiera, , Jordi Blanes-i-Vidal, Kenneth Corts, Pablo Casas-Arce, Baran Duzce, Florian Englmaier, Jeff Furman, Maitreesh Ghatak, Bob Gibbons, De Gruyter, Rosario Macera, Bentley Macleod, Michal Matejka, Bob Miller, Steve Pischke, Andrea Prat, Carol Propper, John van Reenen, Mark Schankerman, Axel Schniederjuergen, Ananya Sen, Chris Stanton, Scott Stern, Neil Thompson, Fabian Waldinger, the ministries of education of the German states and seminar and conference participants at DRUID 2016, the 31st EEA Congres, ESNASM 2017, ESEM 2017, ESEWM 17, IOEA 18, SIOE 18, BePE 2018, AEA 2019, GEIRC 2019, EAA 2019, GSE-OE 2019, AAA 2019, NBER Personnel Summer Institute 2019, MAS Midyear 2020, CMU MAC 2019, MIT TIES, MIT IDE, MIT OE Lunch, Universidad Carlos III, Copenhagen Business School, Carnegie Mellon Tepper, UANDES, Universite Laval and MPI Munich for helpful comments, information or data. A previous version of this paper was titled “Career Concerns in Knowledge Creation”. †Carnegie Mellon University, Tepper School of Business. Tepper Building Room 4123, 5000 Forbes Avenue, Pittsburgh, PA 15213. e-mail: [email protected]. Phone: +1-412-268-1117. 1Knowledge work is defined here as non routine cognitive jobs that comprise a host of intellectual tasks.

1 light on the effect of performance pay on knowledge creation by causally identifying the effort and selection effects of performance incentives in academia. It is by now well-understood that performance pay increases productivity in routine tasks and settings in which output is readily measurable (e.g. car window replacement, fruit picking, students’ test scores), through increases in effort or by attracting the most productive individuals (Lazear, 2000; Shearer, 2004; Bandiera, Barankay and Rasul, 2005; Leuven et al., 2011; Dohmen and Falk, 2011). However, it is not clear that performance pay would have the same effects in the context of knowledge work. For one, knowledge work generally comprises multiple, complex tasks, the output of which is often not measurable or only a noisy signal of effort. Multi-tasking problems are therefore likely to arise (Holmstrom and Milgrom, 1991; Hellmann and Thiele, 2011). Secondly, because quality dimensions such as impact and novelty are valuable outcome characteristics for many types of knowledge work, incentive systems may need to be structured differently, with a longer time-horizon and allowing for (early) exploration and experimentation (Azoulay, Graff Zivin and Manso, 2011; Manso, 2011; Ederer and Manso, 2013). Finally, knowledge workers may be particularly highly intrinsically motivated. Higher-powered extrinsic incentives may crowd-out this intrinsic motivation, thus potentially reducing knowledge output (Benabou and Tirole, 2003; Bénabou and Tirole, 2006; Besley and Ghatak, 2005, 2018). In this paper, I study the effect of performance incentives on the quantity and quality of knowl- edge output and the productivity of knowledge workers attracted by high-powered incentives both empirically and theoretically. I present a simple multi-tasking model with explicit and implicit, market-based incentives commonly found in knowledge work industries and use this to derive testable implications for both average incentive effects and heterogeneous responses across ability types. I test the model’s predictions by exploiting the introduction of performance pay in German academia as a natural experiment, and using a newly constructed dataset of the universe of German academics2. The specifics of the roll-out of the performance-related pay scheme give rise to a differential incidence of performance incentives across tenure and age cohorts. This allows me to causally and separately identify the effort and selection effects in a difference-in-differences framework. The theoretical model presented in this paper builds on Gibbons and Murphy (1992) and features both explicit performance incentives (bonuses for performance on the job) and implicit, market-based incentives (wage supplements negotiated in contract talks). This combination of explicit and implicit incentives is a common feature of pay structures in knowledge work industries3. The model also incorporates multi-tasking issues as output has two dimensions, quantity and quality, the latter of which is less precisely measured (c.f. Holmstrom and Milgrom (1991)). Both output dimensions increase with effort as well as agent ability and agent ability is imperfectly known by the market and the agent (i.e. there is symmetric uncertainty about agent

2This dataset is also used in Ytsma (2021). 3Market-based wages determined in contract negotiations (career concerns) and on-the-job performance bonuses are common performance incentives in academia and knowledge creation jobs more generally (Bonatti and Hörner, 2017), as well as managerial jobs (Gibbons and Murphy, 1992) and professional jobs such as in law (Ferrer, 2016), finance (Hong, Kubik and Solomon, 2000; Chevalier and Ellison, 1999) and software development (Lerner and Wulf, 2007).

2 ability). The market then uses output measures as signals of both effort and agent ability, to inform agent pay. This, in turn creates incentives for the agent to exert effort. Because quality is less precisely measured, the incentives to exert effort toward quality are relatively weaker. Yet because the market takes both quantity and quality output into account to update believes about agent ability, incentives to exert quality are not absent either and they are stronger for higher ability academics, for whom exerting effort towards both quantity and quality is assumed to be relatively less costly. In equilibrium, and relative to a flat wage, output quantity goes up unambiguously in response to performance pay, but output quality increases only if the quality output measure is sufficiently precise. These responses are not uniform across ability types. Performance pay increases quantity effort the most for the lowest ability workers, and least for workers of intermediate ability. Quality effort on the other hand increases the most or decreases the least for the most able workers and decreases the most for those of intermediate ability. That is, there is no simple substitution of quantity for quality, rather the degree to which there is substitution varies across ability types and depends on the noise with which output dimensions are measured. Furthermore, the higher-powered incentives attract workers of higher ability (in expectation), since they can expect to earn more under performance pay. In order to empirically analyze the effect of performance pay in academia, I constructed a data set comprising the affiliations, research productivity measures and related information of the universe of academics in Germany by consolidating information from various, unstructured data sources. To estimate the effort effect, I use the fact that any contract signed or renegotiated after the implementation of the reform necessarily falls under the new performance pay scheme, while any existing contract continues to fall under the old age-related pay scheme. Academics who start their first tenured affiliation before the reform therefore fall under the age-related pay scheme, while those who start their first tenured affiliation after the reform are paid according to the performance pay scheme. If the timing of the start of the first tenured affiliation is exogenous, any difference in the change of productivity from before to after the reform between academics who start their first tenured position just before the reform and those who start a first tenured position directly after the reform can be interpreted as the causal effect of performance pay on effort. I find that performance pay increases research quantity and quality-adjusted quantity by at least 14 to 18% on average4. At the same time, the average quality of publications5 decreases by 9 to 10% in response to performance pay. The response in output quantity is equivalent to treated academics publishing almost one extra paper every three years, while the decline in average quality is equivalent to a decrease of almost 0.22 in the impact factor of the journal in which the average publication appears (this is roughly equal to e.g. the difference in 2-year impact factor between the Journal of Political Economy and the American Economic Journal: Applied Economics (Clarivate Analytics, 2017)) or 3.6 fewer citations to publications. To get a better idea

4Pure research quantity is measured by the number of publications, while the impact factor-rated number of publications, and the number of citations to publications at least six years after publication are used as measures of quality-adjusted quantity. 5Measured as average impact factor rating and average number of citations per paper.

3 of the quality of the work produced in response to performance pay, I analyze the distribution of citations and impact factors. I find that treated academics produce more of low to medium-high quality work, but not more of the highest quality research. These effort effects arise in response to the implicit performance incentives of the wage premiums determined in contract negotiations, with no additional significant effort response to the explicit on-the-job performance incentives. Furthermore, the effort effects are highly persistent; they do not diminish before the end of the study period eight years after implementation of the reform. I find no evidence of pre-existing trends, which lends support to the identifying parallel trends assumption of the difference-in-differences estimation, and no effect of the performance pay reform on academic effort in a placebo difference-in-differences estimation with two cohorts that tenure before the reform. The results are also robust to estimation with synthetic cohorts, where assignment to the treatment and control cohort is determined by the average age at which academics start their first tenured affiliation instead of the actual timing of the first tenured affiliation. These tests lend support to the causal interpretation of the effort effect estimates. The average effort effect estimates hide a considerable amount of heterogeneity, which help explain the absence of an increase in the highest quality work. The greatest increase in quantity comes from the relatively less productive academics, who tend not to produce the highest quality work. High ability academics also increase the number of papers they produce, but they do not produce more of the highest quality work. Medium ability academics, finally, even decrease the number of high-quality papers they produce. To provide further evidence on the quality and impact of the work produced in response to performance incentives, I use textual analysis techniques to construct metrics for the similarity of papers to past and future publications to gauge their novelty and impact, respectively. These metrics are based on cosine similarities of vector representations of the abstracts of papers, similar to Kelly et al. (forthcoming). The analyses confirm the quality response results; the additional work produced in response to performance pay is, on average, not the most novel or impactful. Top productivity academics produce more medium to highly novel work that generates mid- to high-level follow-on, low productivity academics produce additional papers that are just above the median in terms of both similarity to past and future work, while sub-top productivity academics produce more papers that are very novel, but also garner only very little follow-on work. Finally, I estimate the selection effect of performance incentives by testing for differential changes in switching hazard rates by age and cohort for academics across average productivity levels in another difference-in-differences framework. For the first dimension of variation - tenure cohort - I use the fact that only academics who already hold a tenured affiliation before the reform select into the performance pay scheme when they change their affiliation, position or contract. Academics who start their first tenured affiliation after the reform automatically fall under the performance pay scheme from the start. Accordingly, the treated-control designation for the selection analysis is the opposite of what it was for the effort effect analysis, with academics

4 who hold a tenured affiliation before the reform now comprising the treated cohort. The second dimension of variation I exploit here is the age of an academic. The basic wage schemes of the age-related and performance pay schemes compare differently at different ages. The schemes intersect only once, and wages increase with age in the age-related scheme. The performance pay scheme is therefore relatively less attractive for older academics. and selection incentives decrease by age. In line with this, I find that a higher productivity increases the switching rates of treated academics less when they are older. Put differently, older academics need to be of relatively higher productivity in order for them to change affiliation or renegotiate their contract if such a change means switching to performance pay. Performance pay thus attracts more productive academics. Taken together, this paper shows that performance pay schemes commonly found in knowledge creation jobs significantly increase knowledge output quantity. The additional knowledge output produced is however not of the highest quality, for two reasons. Firstly, the quantity effort response is strongest for relatively less productive academics, whose work is not of the highest quality on average. Secondly, academics just below the productivity top actually decrease output quality, while the output quality of academics at the top of the productivity distribution does not change. At the same time, performance pay is effective in attracting more productive academics. Though these academics increase output quantity, they do not produce more of the highest quality work, so the two effects do not reinforce each other. Importantly, this means that the nature of the task matters for the effect performance pay has on output. If tasks are complex, in that effort and output are multi-dimensional and output measures are noisy, the effort response might be more positive or positive only in the dimension(s) with less noisy output measures. In knowledge work and professional work, measures for output quality are likely more noisy. This paper shows that in such a setting performance pay may fail to increase high quality output and may decrease average quality. By studying the effect of performance pay on a non-routine, multi-dimensional and hard-to- measure task like knowledge creation, this paper contributes to the vast literature on performance incentives (cf. Lazear and Oyer (2012); Oyer and Schaefer (2011) for reviews), much of which has focused on more routine tasks with more precise output measures (e.g. Bandiera, Barankay and Rasul (2005); Shearer (2004); Lazear (2000)). Moreover, the (empirical) literature on incentives has mostly studied the effects of explicit performance incentives, ranging from piece-rate pay (Dohmen and Falk, 2011; Bandiera, Barankay and Rasul, 2005; Shearer, 2004; Lazear, 2000) or bonus pay (Hossain and List, 2012; Muralidharan and Sundararaman, 2011; Lavy, 2009) to tournament schemes (Leuven et al., 2011; Carpenter, Matthews and Schirm, 2010; Freeman and Gelber, 2010) and monitoring regimes (Boly, 2011; Dickinson and Villeval, 2008). This paper, on the other hand, studies the effort and selection effects of explicit and implicit performance incentives, in the form of career concerns, as these are common incentives in knowledge work and professional jobs. As such, the paper contributes to the literature on career concerns as well (Bonatti and Hörner, 2017; Miklós-Thal and Ullrich, 2016; Ferrer, 2016;

5 Holmström, 1999; Gibbons and Murphy, 1992; Holmström, 1982). This paper also contributes to the literature on incentives for innovation and knowledge creation as well as the literature on university governance (McCormack, Propper and Smith, 2014; Haeck and Verboven, 2012; Aghion et al., 2010) and the organization of knowledge creation (Jones, 2009; Wuchty, Jones and Uzzi, 2007; Audretsch and Feldman, 1996; Jaffe, Trajtenberg and Henderson, 1993). Many papers in the former literature look at incentives for commercializable knowledge and the commercialization of knowledge (patenting) (Hvide and Jones, 2018; Hall and Harhoff, 2012; Azoulay, Ding and Stuart, 2009; Lach and Schankerman, 2008, 2004). In so doing, the academic fields that can be studied is generally restricted to the (applied) sciences, and it is frequently difficult to differentiate between changes in the production of knowledge that can be commercialized and the rates at which such knowledge is commercialized (patented). Of the papers that study incentives for academic (basic) research more generally, many focus on the effects of funding or awards (Borjas and Doran, 2015; Chan et al., 2014; Azoulay, Graff Zivin and Manso, 2011). In these settings, it is difficult to distinguish between the effect of funding itself and the incentive effect of funding or awards on output. Furthermore, it is difficult to distinguish between the effort effect of performance incentives and selection into the funding or award schemes. This paper contributes to this literature by providing causal and separate evidence of the effort and selection effect of common performance incentives in knowledge creation, across all academic fields. Finally, by studying the effect of performance incentives on the quality, impact and novelty of knowledge output, this paper also contributes to the literature on incentives for novelty and creativity. A number of recent papers study the effects of competition in online platforms on the creativity or novelty of outputs such as logo designs (Gross, 2016), novels (Wu and Zhu, 2018) and software algorithms (Boudreau, Lacetera and Lakhani, 2011; Boudreau, Lakhani and Menietti, 2016), with mixed results. Gibbs, Neckermann and Siemroth (2017) find that rewarding employees for ideas for product and process improvements raises the quality of ideas submitted in a field experiment setting, while Erat and Gneezy (2016) find that piece-rate pay does not alter creativity, though competitive incentives reduce creativity in a lab experiment. Ederer and Manso (2013) finally show in another lab setting that the structure of rewards, allowing for early failure and rewarding long-term success, is important for innovation. The paper is structured as follows: the next section provides information on the institutional background and section 3 presents the theoretical model. The empirical analysis makes up section 4, with the first part focusing on the effort effect and the second part on the selection effect of performance pay. Section 5 concludes.

2 INSTITUTIONAL BACKGROUND

The German academic pay reform that I exploit as a natural experiment in this paper introduced a new pay scheme (“W-pay”) under which can earn performance-related bonuses

6 on top of a basic wage (BMBF, 2002). These performance-related bonuses can be substantial; potentially more than doubling a ’s monthly wage. Before the reform, professors were paid according to an age-related pay scheme (“C-pay”) in which pay increased at a pre-determined rate with age (Hochschullehrerbund, 2009; Oeffentlicher Dienst, 2004; Expertenkommission, 2000).

2.1 Performance Pay (W-Pay)

There are three basic pay levels in the performance pay scheme: W1 pays a basic monthly wage of 3405.34 euro, W2 3890.03 euro and W3 4723.606 (Detmer and Preissler, 2006; Oeffentlicher Dienst, 2004). Tenured professors receive either W2 or W3 pay. Professors can earn performance bonuses on top of these basic wages in the W-pay scheme in three ways: as attraction or retention bonus, for on-the-job performance, and for taking on management roles or tasks (BMBF, 2002). Although federal and state laws lay down the ground rules for the performance pay, universities have discretion in whom to award performance bonuses, and how much7. Many universities set out procedures for the award of bonuses in university statute supplements (Detmer and Preissler, 2006). Only tenured professors can earn substantial bonuses in the performance pay scheme. I therefore restrict attention to tenured professors when analyzing effort and selection effects of the pay reform in this paper. The first kind of performance bonus, the attraction or retention bonus, is a wage premium that is determined as part of contract (re)negotiations and generally awarded on the basis of a professor’s qualifications and past achievements and performance, taking into account applicant pool quality and labor market tightness (Detmer and Preissler, 2005). In order to be able to negotiate such a bonus, professors are often required to show proof of another/outside offer (Detmer and Preissler, 2005). The attraction and retention bonuses are thus implicit, market- based incentives. These incentives do not derive from an explicit performance contract between an academic and any particular university, but rather from the academic’s expectation to be able to influence future attraction or retention bonuses, either at the current university or another, by exerting more effort now. Since an academic’s past performance is used to update beliefs about the academic’s ability and the (bonus) pay offered is driven to reflect beliefs about the academic’s expected productivity under the influence of competition in the labor market, these incentives take the form of career concerns. Career concerns incentives are very common in academia and knowledge creation jobs more generally(Bonatti and Hörner, 2017; Lerner and Wulf, 2007), as well as in managerial and professional jobs (Chevalier and Ellison, 1999; Gibbons and Murphy, 1992).

6These were the basic wage levels under the performance pay scheme in former West-German states determined as of 1 August 2004. The corresponding basic wage levels in former East-German states were 92.5% of the West-German rates (Detmer and Preissler, 2006). 7See Handel (2005) for a comprehensive overview of how much discretion higher education institutes have regarding hiring and pay decisions after the reform in the different German states. Only very few states (e.g. Bremen), require the state’s minister of education to have a say in bonus negotiations (Detmer and Preissler, 2006).

7 In the German performance pay system, the career concerns incentives should take effect from the moment the reform is announced for those individuals that anticipate they may fall under the performance pay scheme at some point in the future, since improved performance before the implementation of the reform will increase the chance they can command lucrative attraction or retention bonuses after implementation of the reform. I use this timing to separately identify the causal effort effect of the implicit, career concerns incentives from that of explicit performance incentives, to which I turn next. The second type of bonus introduced with the pay reform are bonuses for on-the-job per- formance. These can be awarded for performance in research, art, teaching, mentoring and supervision (BMBF, 2002). To assess research performance for instance, universities take into account the number and quality/rank of publications, external research grants, patents and re- search prizes, while exceptional teaching evaluations, the development of didactic methods and teaching grants and prizes can serve as evidence of special teaching achievements (Detmer and Preissler, 2005; Universitaet Regensburg, 2016; Gien, 2017). Most state laws stipulate that the performance for which on-the-job bonuses are awarded must have been effected over multiple years (often at least 3 years)(Handel, 2005). Universities generally have the option to award attraction or retention and on-the-job bonuses on a permanent basis, for a fixed term (initially) or even as a one-off payment (Detmer and Preissler, 2004, 2005). If bonuses are awarded for a fixed term with the option of renewal, universities frequently enter into a target agreement with the respective professor, especially if it concerns an attraction bonus for a first-time tenured professor (Detmer and Preissler, 2006). The target agreement specifies the achievements, such as the number and type of publications and grants, that are expected of the professor in a 3- or 5-year period. If these targets are met, the attraction or retention bonus continues to be paid, either for another 3- to 5-year period, or permanently. Target agreements may also allow for partial fulfillment, such that, when a professor secures external funds below a certain threshold for instance, the bonus that they (continue to) receive is lower (Detmer and Preissler, 2006). The on-the-job bonuses and target agreements constitute explicit performance incentives of the sort commonly seen in knowledge creation, managerial and professional jobs (Lerner and Wulf, 2007; Hong, Kubik and Solomon, 2000; Gibbons and Murphy, 1992). They take effect only after the reform is implemented, when professors enter into the performance pay scheme. The third kind of bonus in the performance pay scheme comprises pay supplements for taking on management tasks or roles (BMBF, 2002). These bonuses are paid as lump-sum payments for the duration of the task or role and are therefore not performance incentives. Finally, the reform also introduced the option for professors to extract pay supplements from third-party awarded funds for research or teaching projects for the duration of such projects (BMBF, 2002). To the extent that grant application committees use an academic’s past perfor- mance to update their beliefs about the academic’s ability and chances of success, academics who anticipate to fall under the performance pay scheme have an incentive to improve their

8 performance from the moment the reform is announced in an attempt to improve their chances of winning a grant and earning a pay supplement. The grant pay supplements thus introduce implicit performance incentives of a career concerns nature as well. The performance bonuses are not a rare occurrence. In 2006, only 23% of the professors in the performance pay scheme did not receive a performance bonus (BMI, 2007). Of the different types of bonuses, the attraction and retention bonus is the most important, both in terms of frequency of award and total amount awarded. Attraction and retention bonuses were awarded almost six times as often as on-the-job performance bonuses in 2005 and more than three times as often in 2006, and they were awarded almost six times as often as function-specific bonuses in both years (BMI, 2007). Furthermore, about 75% of the total amount of bonus pay in the performance pay scheme up until 2008 was awarded as attraction or retention bonus (Biester, 2010).

2.2 Comparison With Age-Related Pay (C-Pay)

There are four pay levels in the age-related pay scheme (C1-C4); university professors are generally awarded C3 or C4 pay. The monthly salary in these levels increases every two years by roughly 170 Euros8, from the age of 21 to the age of 49 (Hochschullehrerbund, 2009; Oeffentlicher Dienst, 2004; Expertenkommission, 2000). In contrast, the basic wage in the W-pay scheme does not vary with age, and the level is such that professors earn a higher before-bonus wage in the performance pay scheme at first, but once they get older, they would earn a higher basic wage in the C-pay system. depicts the monthly wage by age for the several pay levels in the performance pay and age-related pay schemes. Thus the basic wage schedules have a single the crossing point the location of which depends on the specific pay level of an academic; the age-related wage starts to exceed the basic wage in the performance pay scheme either at age 33 or at age 43 (Cf. Appendix Figure A.1) (Oeffentlicher Dienst, 2004; Handel, 2005). Before the pay reform, professors in the highest pay level of the age-related pay scheme (C4) could earn bonuses when they received offers after their first appointment as C4-professor. These bonuses were standardized to be around 650 euro per month for the second C4-offer, and about 730 euro for the third C4-offer from another university, and roughly 75% of this if a counter-offer of the home university was accepted (Detmer and Preissler, 2006; Preissler, 2006; Dilger, 2013). By comparison, the average attraction and retention bonus in the W-pay system had already grown to 1187 euro per month in 2006, and the average on-the-job performance bonus to 1649 euro (BMI, 2007). Furthermore, only a small fraction of professors qualified for and received bonuses under the age-related pay system. Handel (2005) for instance calculates, using data from the Ministry of Science and Culture in Niedersachsen, that only 16.5% of professors received attraction or retention bonuses in the age-related pay system. In contrast, any tenured professor in the performance pay system can receive bonuses, and in 2006 already 77% of professors in the

8Using pay tables valid as of August 2004 (Hochschullehrerbund, 2009)

9 performance pay scheme did receive bonuses (BMI, 2007). Consequently, only 3.55% of the total professorial pay volume was spent on attraction and retention bonuses in the age-related system, before the reform, while an estimated 26% of the professorial pay volume was available for performance bonuses under the performance pay scheme immediately after the reform (Handel (2005), using data from Expertenkommission (2000)). Combined with the fact that, at most ages, the basic wage is lower in the performance pay system than in the age-related system, this means that a larger portion of professorial pay depends on performance and there is a greater spread in professorial pay in the W-pay system. The W-pay system therefore offers higher-powered performance incentives than the old, age-related pay system.

2.3 Implementation

The federal law introducing the new professorial pay scheme was passed by Germany’s par- liament in February 2002 and applies to all public institutions of higher education.. The law required all states to implement the reform within their respective jurisdiction latest by 1 January 2005 and only three states (Bremen, Niedersachsen and Rheinland-Pfalz) did so before the end of 2004 (Detmer and Preissler, 2005). Hence any new or renegotiated professorial contract entered into as of 1 January 2005 falls under the performance pay scheme, while existing contracts continue to fall under the age-related pay scheme. Importantly, once a professor switches to performance pay 2004, they can never go back to age-related pay (Detmer and Preissler, 2004). Professors do not have to wait for outside offers in order to be able to renegotiate their contract and switch to performance pay; they can opt into the performance pay scheme at any time after the reform’s implementation. Preissler (2006) however reports that few professors have exercised this option. Appendix A2 provides additional institutional details.

3 THEORETICAL FRAMEWORK

The model outlined here and presented in detail in Appendix section A1.2 illustrates how a combination of career concerns and explicit performance pay bonuses affect effort when effort and output have two dimensions: quantity and quality. The model builds on the multi- tasking model of Holmstrom and Milgrom (1991) and the career concerns model with explicit performance incentives in Gibbons and Murphy (1992). The combination of market-based wages reflecting perceived ability and explicit performance bonuses aligns with the structure of the performance pay scheme introduced in German academia, with its attraction and retention bonuses negotiated in contract talks and additional on-the-job bonuses. It is also representative of pay structures in academia and knowledge creation jobs more generally (Bonatti and Hörner, 2017), as well as managerial jobs (Gibbons and Murphy, 1992) and professional jobs such as in law (Ferrer, 2016), finance (Chevalier and Ellison, 1999; Hong, Kubik and Solomon, 2000) and software development (Lerner and Wulf, 2007). Dewatripont, Jewitt and Tirole (1999) also study a career concerns model with multitasking, but their analysis centers around total effort across

10 equally noisy output signals, while this model focuses on effort allocation across tasks (quantity and quality) that differ in the noise with which their output is measured. Consider a labor market in which risk neutral principals interact with infinitely lived, risk −→  −→ averse agents. Each period t = 0,1,2,.., agents choose effort et = ep,t,eq,t ≥ 0 and produce −→  output yt = yp,t,yq,t . Effort cannot be observed by principals, but output is observable to all market participants. Output has two dimensions, quantity yp,t and quality yq,t, each of which is a noisy signal of agent ability θ and effort put towards the respective output dimension: yp,t = θ + ep,t + εt and yq,t = θ + eq,t + νt. Here εt,νt are iid sequences of normally distributed 2 2 2 2 noise terms: εt ∼ N 0,σε ,νt ∼ N 0,σν ,σε > 0,σν > 0. I assume that the quality measure is 2 2 more noisy, σε < σν . Ability is not known either by agents or principals, but there is common knowledge about the prior ability distribution. In particular, an agent’s ability θi is an iid draw from a normal 2 distribution with mean mi,0 ∈ [m,m],m ≥ 0 and variance σ0 > 0. I allow for abilities to be drawn from distributions with different means, to reflect the possibility that agents can distinguish themselves before entering the labor market, for instance during their studies. Agents have CARA preferences with risk aversion parameter r. Their cost of effort is multivariate quadratic, specifically " # −→ h−→ −→iT 1 h−→ −→i h−→ −→iT 1 c d h−→ −→i c( et ) = et − e C et − e = et − e et − e (1) 2 2 d c

−→ Inclusion of e follows Holmstrom and Milgrom (1991) and ensures positive effort levels even in the absence of performance pay. These minimum effort levels capture other mechanisms and institutions that drive effort, such as intrinsic motivation or minimum output requirements (e.g. tenure requirements). For simplicity, and without loss of generality, I set c = 1 and to assure concavity of the relevant optimization problems, I assume that 0 < d < 1 so that C is positive definite. This means that effort towards quantity ep,t and effort towards quality eq,t are substitutes.

I allow for the degree of substitution to vary by ability class, d = d (m0). In particular, I follow Rubin, Samek and Sheremeta (2018) in assuming that quantity and quality effort are weaker substitutes for agents in higher ability classes, ∂d < 0. ∂m0 Under flat wage contracts, when all agents are paid the same wage regardless of output, ∗ ∗ all agents optimally provide minimal effort; ep,t = e¯p and eq,t = e¯q. I contrast this against a performance pay system that comprises both explicit performance contracts and career concerns. In particular, consider a perfectly competitive labor market in which only short-term contracts are feasible. As in Gibbons and Murphy (1992), I restrict attention to linear contracts of the form −→ −→T −→ wt ( y t) = ct + bt y t. This not only ensures tractability, but Holmstrom and Milgrom (1991) also show that optimal contracts are linear in a setting with comparable assumptions about agent preferences and output noise. In each period, the timing of actions is as follows: principals offer a contract (wt) to agents; agents pick the contract that yields the highest expected utility; agents

11 choose and exert effort; output materializes, principals receive the output produced by agents they employ and agents are paid according to their contract terms. Proposition 1 in Appendix 1 shows that equilibrium effort under this performance pay system is given by:

(1 − d) + r σ 2 − dσ 2 e∗ =e¯ + ν ε 1 + rCC σ 2σ 2 (2) p,t p,t (1 − d2)D t ε ν " # (1 − d) + r σ 2 − dσ 2 e∗ = max 0,e¯ + ε ν 1 + rCC σ 2σ 2 (3) q,t q,t (1 − d2)D t ε ν

2 ∞ τ σ0 2 2 2 where CCt = 2∑τ=0 δ 2 2 2 2 2 is the career concerns effect, D := 1+r 2σt + σε + σν + σε σν +(t+τ)σ0 (σε +σν ) 2 2 2 2 2 2 2 2 2 2 σ0 σε σν r σt σε + σν + σε σν and σt := var (θt ) = 2 2 2 2 2 . Proposition 2 in the Appendix σε σν +tσ0 (σε +σν ) provides testable implications for the effort and selection effects of this performance pay system, as compared to the flat wage system. I summarize these here. First, effort towards quantity is unambiguously larger under performance pay, while quality effort is higher only for high ability agents and only if quantity and quality are sufficiently weak substitutes or the quality measure is not too noisy. If the quality measure is very noisy, no agent increases quality effort. Thus, while quantity is expected to increase unambiguously upon the introduction of performance pay, what happens to quality is ultimately an empirical question. The second implication relates to heterogeneous effort responses. High ability agents increase quality the most or, if the output measure is very noisy, decrease it the least. Low ability agents may end up at a corner solution, where they put no effort towards quality. If the minimum effort level these agents (are required to) put towards quality absent performance pay is very low, the difference in quality effort of low ability agents between pay systems is nil or very small. As for quantity effort, if the quality measure is not too noisy, both low and high ability agents increase their effort levels the most, with intermediate ability classes the least responsive. However, if the quality measure is very noisy, such that no agent increases quality, the quantity response decreases monotonically with agent ability. Effort response heterogeneity by ability type is therefore an empirical question as well. Thirdly, if some, but not all agents select into performance pay, only high ability agents select into performance pay. The ability cut-off for such selection is higher when flat wages are higher. The same selection results apply to selection into academia and labor markets with similar incentive structures as compared to an outside option that provides a given baseline utility. Appendix 1 also shows that equivalent implications hold in the presence of career concerns incentives only. The effort and selection effects derived here are thus quite general and apply to incentive structures commonly found in knowledge work and professional jobs.

12 4 EMPIRICAL ANALYSIS

4.1 Data Description

In order to test the theoretical implications for the effort and selection effects of performance pay, I constructed an individual level panel data set that encompasses the affiliations of the universe of academics in German academia for the years 1999-2013, as well as their publication records from 1993 onwards. The data set also provides personal information such as the year in which an academic completed their PhD, obtained their postdoctoral qualification, and started working in academia, as well as an academic’s gender, birth year and, if applicable, year of passing. All in all, the panel encompasses 50174 academics who held a tenured position at a German public university at some point between 1999 and 2013. I restrict attention to public universities only, of which there are 89 in Germany in the years spanned by the panel (HRK, 2014). I discard higher education institutions other than universities, because I focus on research productivity as outcome variable, and the research output of universities is incomparable to that of other higher education institutions BMBF (2002). I further restrict attention to academics who hold a tenured affiliation at some point in the sample period, because performance bonuses can be earned in tenured positions only. To construct the individual panel data set, I draw from three main, mostly unstructured, input data sets; Kuerschners Deutscher Gelehrten Kalender and Forschung & Lehre Magazine for affiliations, and ISI Web of Science for publications data. Kuerschners Deutscher Gelehrten Kalender (hereafter: DGK) is a comprehensive encyclopedia of academics affiliated with German universities (De Gruyter, 2013, 2006, 2008). I use DGK as a register of the universe of academics affiliated with German universities and extract academics’ personal information (full name, birth date, year of passing, gender) as well as professional information (academic affiliation at different points in time, start year of academic career in Germany, end year of academic career in Germany, self-reported information on career history) from it. I supplement the information in DGK regarding the timing of affiliation changes and the obtainment of postdoctoral qualifications with information from Forschung & Lehre Magazine (hereafter: FuL) (DHV, 1999-2013). FuL is Germany’s largest academic professional magazine. Every month, it publishes an overview of scholars that obtained their post-doctoral qualification (habilitation), as well as professorial offers that were extended, accepted or rejected. Finally, I extract publication records for the academics in my data set for the years 1993-2012 from the ISI Web of Science database to construct measures of research productivity Clarivate Analytics (1993-2012a) and I use journal impact factors taken from the ISI journal citation report (JCR) of the year of publication9 (Clarivate Analytics, 2000-2012b) to construct impact factor-rated publication measures. I match academics across the three input data sets on the basis of last name, initials and field, and discard any resulting duplicate matches. I further improve the matching by, for instance

9Due to data availability limitations, I have ISI JCR data for the years 2000-2013 only. I therefore use the average of the impact factors from JCR 2000 through JCR 2004 to weigh publications before 2000.

13 exploiting additional information such as the start or end date of academic careers to rule out implausible matches. Doing so yields an 83% match rate of academics whom FuL reports as having a tenured affiliation at a German university to academics listed in DGK. Differences in the spelling of names, typos and erroneous information regarding affiliation changes in FuL mostly explain the 17% that I cannot match. Where possible, I resolve such inconsistencies manually. I have direct information on the timing of half the affiliation changes10 in my panel data set from FuL. For the other half of affiliation changes, DGK provides the year of change in 23% of the cases11 and I infer the timing of the remaining affiliation changes from academic affiliations listed in DGK at different points in time, the year they passed their habilitation as well as the start and end year of their academic career in Germany recorded in DGK. A detailed description of the construction of the data set used for the analyses below is provided in Section A3 of the Appendix.

4.2 Effort Effect

In order to identify the pure effort effect of the introduction of performance pay in German academia on knowledge creation, I use the fact that any contract for a professorial position at a public university in Germany signed or renegotiated as of 1 January 2005 necessarily falls under the performance pay scheme, whereas any contract signed before this date falls under the old, age-related pay scheme12. Accordingly, academics who start their first tenured affiliation before 2005 continue to fall under the age-related pay scheme, whereas academics who start their first tenured affiliation after 2004 switch to the performance pay scheme upon starting their first tenured position. If the timing of the start of the first tenured position is exogenous, the performance incentives that first-time tenured affiliates face are exogenous as well. I can then identify the causal effort effect of performance pay on knowledge creation by comparing the change in research productivity from before to after the pay reform of academics who start their first tenured affiliation before 2005 (the control group) with the change in research productivity of academics who start their first tenured affiliation as of 1 January 2005 (the treatment group). Unless indicated otherwise, I use a three-year window before and after the reform to define the treatment and control group for the analyses below in order to abstract from seniority effects. Thus the treatment group consists of academics who start their first tenured position at a public university in 2005, 2006 or 2007, while the control group consists of academics who start their first tenured affiliation at a public university in 2002, 2003 or 2004. Results are however robust to extending or reducing the cohort window (Appendix Table A.3 panels E and F). I exclude academics who hold a foreign affiliation before their first tenured affiliation in Germany to avoid

10Where at least one of the affiliations concerns a tenured position at a German university. 11This is self-reported career information and hence may introduce bias in my data set. I therefore use the information regarding affiliation changes provided in FuL whenever available. Reassuringly though, a consistency check revealed that the information in DGK regarding the timing of affiliation changes differs from that in FuL for only 5% of the individuals who change a (tenured) affiliation at least once. 12With the exception of Bremen, Niedersachsen and Rheinland-Pfalz, who introduced performance pay before this deadline (in 2003 and 2004, respectively) (Detmer and Preissler, 2005). Note that using 2005 as uniform before-after cut-off yields a conservative measure of the effort effect, since some of the control group is, in fact, already treated before this time.

14 confounding the effort effect with selection effects of performance pay. The treated cohort comprises 2,844 academics, the control cohort 3,197.

4.2.1 Descriptive Statistics

For the effort effect analysis I focus on measures of research productivity that are based on the publications of academic i in field f in year t + x f , where x f denotes the average publication lag in field f , rounded up to the nearest year. The average publication lags are taken from Björk and Solomon (2013) and range from 8 months for Chemistry to 18 months for Economics and Business. Correspondingly, I backdate publications by one to two years. After correcting for average publication lags I have productivity measures for (at least) 18 years, from 1993 through 2010; from 9 years before the announcement of the reform until 9 years after, and from 12 years before the implementation of the reform until 6 years after. The productivity measures take all available publication types into account, from journal articles to books and from book chapters to conference proceedings, with the exception of citations, which are available only for articles in the journals indexed by Web of Science and which do not include citations to books, chapters, patents, etc. Table 1 reports summary statistics for the treatment and control cohort. On average, academics in the treatment and control cohort publish almost 3 papers per year. Weighting publications by the two-year impact factor of the outlet in which it appears brings this sum to almost 10. To put this in perspective, the latest two-year impact factor ratings, available for 2017, put the top five general interest journals in economics at an impact factor rating between 3.750 (Econometrica) and 7.863 (Quarterly Journal of Economics), while a top field journal like the Journal of Labor Economics had an impact factor rating of 3.607 (Clarivate Analytics, 2017). Impact-factor ratings do get considerably greater than this. Science had an impact factor rating of 41.058 and Nature of 41.577, for instance (Clarivate Analytics, 2017). The average total number of citations to publications from a given year is 102. Citations data was extracted from ISI Web of Science in January 2019, so there are at least 6 years between the publication date and the time at which citations were counted for each publication. The distributions of these variables are highly skewed: the median academic does not have any publication in any given year, while the most productive academics produce orders of magnitude more work than the average academic by any of these measures.

4.2.2 Baseline Difference-in-Differences

I estimate the effort effect in a parametric difference-in-differences model:

7   0 E Yi, f ,t−x f |Xi, f ,t = exp[αi + β1Post 02 ∗ Treatmenti + β2Tenurei, j ∗ Treatmenti + ∑ ttti, j + γt ] (4) j=−7

15 The dependent variable, Yi, f ,t−x f is a productivity measure of academic i in field f in year t − x f , where x f denotes the average publication lag in field f as defined above. The Treatment variable is 1 for academics who start their first tenured affiliation at a public university in 2005, 2006 or 2007, and 0 for those who start their first tenured affiliation at a public university in 2002, 2003 or 2004 (the control cohort). The variable Post002 is 1 as of 2002 and 0 beforehand, and the variable Tenure is 1 as of the year in which an academic starts their first tenured affiliation at a public university and 0 beforehand. The ttti, j variables are time-to-tenure dummies. They control flexibly for the seven years before and after academics start their first tenured position, as 13 well as the tenure year itself . I also include individual fixed effects, αi, and calendar year fixed effects, γt. Taken together, these fixed effects control flexibly for calendar year fixed effects, cohort-specific relative time fixed effects, and individual academic fixed effects14. I estimate the model as a conditional quasi-maximum likelihood fixed-effect Poisson model15, because the dependent variables are highly skewed with a large mass at zero and long right tail. The corresponding estimation results are shown in Table 2. Robust standard errors, clustered at the individual level, are reported throughout. This difference-in-differences specification distinguishes two before and after periods. The Post002 variable is included to pick up on the effect of career concerns incentives that take effect as of the announcement of the reform, while the Tenure variable captures the effect of the explicit on-the-job performance bonuses that kick in upon entry into the performance pay scheme. The moment the reform is announced, the lure of future attraction and retention bonuses and, consequently, the career concerns incentives of the performance pay scheme take effect. Because tenure-track positions generally do not exist in Germany, academics need to move to a new university and negotiate a new contract to obtain tenure. Academics who anticipate starting their first tenured affilliation in the performance pay system therefore face strong career concerns, as their pre-tenure performance can influence their tenure contract negotiations and tenure pay. I can thus identify the effort effect of career concerns off of the differential change in productivity of about-to-be-tenured academics here. The incentive effect of the explicit on-the-job performance bonuses takes effect only after academics enter into the performance pay scheme. This entry into the performance pay scheme coincides with the start of the first tenured affiliation for academics 16 0 in the treated cohort . The Post 02 ∗ Treatmenti and Tenure ∗ Treatmenti interaction terms taken together therefore provide a difference-in-differences estimate of the total effort effect of career concerns and explicit performance incentives in knowledge creation. Because of the

13Including 7 year-to-tenure dummies aligns with the institutional setting here, as the median number of years between the end of the PhD and the completion of the habilitation is 7 and academics are, traditionally, required to have completed their habilitation before they become eligible for a tenured affiliation. Results are however robust to including other sets of year-to-tenure dummies (Table A.3 Panel G and H). 14Note that individual fixed effects subsume academic field fixed effects here, because an academic’s field is kept constant throughout. 15This is the same model as used in, for instance, Azoulay et al. (2015). Even though the dependent variables here are not all integers, Silva and Tenreyro (2006) show, using a result from Gourieroux, Monfort and Trognon (1984), that the estimator based on the Poisson likelihood function is consistent even for non-integer dependent variables, as long as the conditional mean is correctly specified. 16Universities generally announce either the number of on-the-job bonuses or the total amount of on-the-job bonus pay to be paid out in a given year at the beginning of that year. (Lünstroth, 2011) These incentives thus do not just vary across universities, but by university and year. On top of that is the variation, at the individual academic level, in target agreements. Identifying the effort effect of the explicit performance incentives by exploiting cross-university variation is therefore not feasible, even aside from the potential bias due to sorting. I therefore estimate the average effort effect of explicit performance incentives here.

16 way the Post002 and the Tenure variable are defined, the interactions of these variables with the Treatment variable capture persistent differential changes in the research productivity of the treated cohort relative to the control cohort. The announcement of the reform occurs at the same calendar time for all tenure cohorts, but at a different time relative to tenure. The start of the first tenured affiliation, on the other hand, occurs at the same relative time, but at a different calendar time for all cohorts. Because the specification includes individual fixed effects (which subsume cohort and group fixed effects), including all calendar time and relative time fixed effects would yield a specification that is underidentified, and result in the underweighting of long-run effects (see e.g. Borusyak and Jaravel (2017); Abraham and Sun (2018); Goodman-Bacon (2018)). Including at most 15 year- to-tenure (relative time) fixed effects for each cohort17, and thus dropping at least three relative time fixed effects for each cohort and treatment group, prevents the underidentification problem, while estimating treatment effects relative to a control group avoids underweighting long-term effects18 Borusyak and Jaravel (2017). The control group pins down calendar time and relative time, so that the (restricted) treatment group-specific calendar time and relative time fixed effects 0 (the Post 02∗Treatmenti and Tenure∗Treatmenti interaction terms) in the specification estimate the effort effect of career concerns and explicit performance incentives, respectively, by allowing for differences in behavior of the treatment group around the points in calendar time and relative time when these incentives take effect.

4.2.3 Baseline Results

Table 2 shows that research quantity and quality-adjusted quantity increases in response to performance incentives, but the average quality decreases. The positive and significant (at 1%) Post002 ∗ Treatment interaction implies that there is a persistent increase in the number of publications of the treated cohort by 18.3% after the announcement of the reform, when career concerns come into effect, relative to the control cohort19. There is no additional increase in the number of publications from tenure onwards, when treated academics enter into the performance pay scheme and explicit performance incentives take effect. To allow for a more easily interpretable result, I also estimate a linear fixed effects version of the baseline regression, the results of which are reported in Table A.2. The results in column 1 suggest that academics in the treated cohort produce almost one extra publication every three years after the announcement of the reform compared to the control cohort20. Moving from a pure quantity measure of productivity to measures of quality-adjusted quantity, I find comparable results (cf. columns 2

17I drop year-to-tenure fixed effects that are far from the time of treatment (here: tenure) as this normalization allows for a more stable estimation (see e.g. Borusyak and Jaravel (2017)). Furthermore, dropping only fixed effects far from tenure, and not tenure itself, makes for easier interpretation. 18The results are robust to including a non-linear function of relative time - the absolute value of time-to-tenure - instead of the restricted set of time-to-tenure fixed effects to prevent underidentification (Table A.3 Panel C), or including different sets of year-to-tenure dummies (Panels G and H in Table A.3). 19The exponentiated coefficients of the Poisson QML, minus one, can be interpreted as elasticities. 20The estimation results of the linear FE model should be interpreted with caution because the model is likely misspecified given the censored and highly skewed distribution of the dependent variables.

17 and 3 in Table 2). The number of publications of the treated cohort, weighted by impact-factor rating, increases by 14.2% after the announcement of the reform (significant at 1%), while the sum of citations to publications published in a given year increases by 13.8% relative to the control cohort (at 5% significance), with no further increase upon entry into the performance pay scheme. The average quality of the publications, however, decreases after the announcement of the reform. The average impact factor rating of publications produced by the treated cohort decreases by 9% (significant at 1%), and the average number of citations decreases by a marginally significant 10% relative to the control cohort (columns 4 and 5 in Table 2)21. The coefficient estimates of the equivalent linear fixed effects estimation in columns 4 and 5 of panel A in Table A.2 allow for slightly easier interpretation. The average impact factor rating of publications of treated academics after the announcement of the reform decreases by 0.216. This means, for instance, that a treated academic whose average publication before the announcement of the reform appeared in the Journal of Political Economy, which had a 2-year impact factor rating of 5.247 in 2017, publishes in journals like the American Economic Journal: Applied Economics, which had an impact-factor rating of 5.028 in 2017 (Journal Citation Report 2017), on average after the announcement of the reform. The 10% decrease in the average number of citations in 2019 to publications by treated academics published after the reform announcement is equivalent to these publications having received on average 3.6 fewer citations than publications of the control cohort published in the same year. Additional analyses show that there is no significant decrease in either the maximum or minimum number of citations to the publications of treated academics (Panel A, Table A.1). To get a better idea of the quality of the work produced in response to performance pay, I analyze the distribution of citations and impact factors next. I calculate the percentiles of citations and impact factor ratings separately by field and publication year and use the percentile cut-offs to generate quantile frequency variables for each author and publication year. To illustrate, if an author has three publications in a given year, one of which garners a number of citations that puts it in the top quartile of citations of publications in the same field and publication year, while the other two papers fall in the bottom quartile of citations, then the top quartile frequency variable is equal to 1, the bottom quartile frequency variable 2, and all other quantile frequency variables 0. I estimate the baseline model separately for all quantile frequency variables. The histograms 0 in Figure 1 depict the resulting Post 02 ∗ Treatmenti (grey bars) and Tenure ∗ Treatmenti (white bars) coefficient estimates and 95% confidence intervals. These figures clearly show that treated academics produce more of low to medium-high quality work, but not more of the highest quality research. In short, I find evidence of a positive and significant average effect of performance incentives

21The difference in the number of observations across columns in this table occurs for two reasons. While observations are missing for the average quality measures in years in which an academic does not have any publications, the quantity and quality-adjusted quantity variables have zero entries for those years. Further differences between columns arise because I have to drop authors for whom all observations are 0, or for whom I have only one observation in order to estimate the Poisson model.

18 on the total raw and quality-adjusted quantity of knowledge output. This response arises from the moment high-powered career concerns incentives take effect, with no additional increase in response to explicit performance bonuses. The effect size ranges from 14 to 18%, which is of the same order of magnitude as previous estimates of the effort response to performance incentives, albeit mostly explicit performance incentives for routine tasks, in the literature (e.g. Lazear (2000); Shearer (2004)). The extra output produced, however, is not of the highest quality, as only the number of publications produced in low to medium-high citation and impact-factor quartiles increases. Indeed, there is a significant decrease in the average quality of knowledge output of around 9 to 10% in response to the introduction of performance pay.

4.2.4 Robustness

There is no evidence that the changes in productivity metrics are the result of strategic co- authorship behavior. The average number of co-authors on papers does not increase (Table A.1, Panel B) and the results of the baseline regression with dependent variables weighted by number of authors (so a paper with three authors counts for one-third) are very similar to the baseline results (Table A.2, Panel B). Papers also do not become significantly shorter (Table A.1, Panel B). Academics who are paid according to the age-related pay scheme can switch to the perfor- mance pay scheme after its implementation by changing affiliation or position, or by opting into the performance pay scheme while retaining the same position. Academics in the control cohort may therefore end up being treated as well. Any effort response of the control cohort would lead me to underestimate the effort effect of the treated cohort, so, if anything, the baseline results provide a conservative estimate of the effort effect. To test this, I re-estimate the baseline specification with a control group that excludes any switchers, where I label any academic who changes affiliation, position or contract22 after implementation of the pay reform (as of 2005) as a switcher. The effort effect estimates for output quantity and quality-adjusted quantity are indeed larger, ranging from a 17% increase in citations to a 23% increase in the number of publications, while the estimates of the effects on average quality are qualitatively the same (Table A.2 Panel C). The effort effect results are also robust to restricting attention to articles and proceedings papers only23 (Table A.3, Panel A); widening or narrowing the treatment and control cohort windows 0 (Panels E and F); including a Post 05∗Treatmenti interaction instead of the Tenure∗Treatmenti interaction to control for implementation instead of entry into the performance pay scheme (Panel D ); or transforming the dependent variables using the inverse hyperbolic sine transform and estimating as a panel fixed effects model (Panel B ).

22I assume that, whenever academics receive an outside offer, they either accept and change affiliation, or reject and renegotiate their current contract, and consequently switch to performance pay. If there are academics who do not at least renegotiate their contract when they receive an outside offer, this overestimates switching and leads to a conservative estimate here. 23Specifically, I restrict attention to publications in the following ISI web of science categories only: “Article”, “Article: Book”, “Article: Book Chapter”, “Article: Proceedings Paper”, “Proceedings Paper”.

19 Finally, in supplementary materials (Appendix section A4.2) I show the results of an alternative identification strategy; using an instrumental variables approach to estimate the effort effect of the performance pay reform in academics that switch to the performance pay scheme. I instrument for endogenous switches into performance pay (by academics who start out in the age-related pay scheme) with age and age cut-offs that align with the single crossing points of the basic wage schedules of the performance pay and age-related pay schemes (Cf. Figure A.1). The results are qualitatively similar, though concerns about the validity of the instruments suggest these results are indicative at best.

4.2.5 Validity of Identifying Assumption

The 14% to 23% increase in quantity and quality-adjusted quantity, and the 9% to 10% decrease in average quality can be interpreted as the causal effort effect of performance pay on knowledge creation if, absent the reform, the productivity of the treatment and control cohort would have evolved along parallel paths. There are a number of potential threats to identification; it could be that other events that occurred around the same time are driving the result or that the timing of tenure is endogenous. I discuss these concerns and how I address them below. Any events that occur around the time of the pay reform but that do not affect the pre- and post-reform first-time tenured cohorts differentially are not a threat to identification. For this reason, the start of the “Excellenz Initiative”, a large funding initiative for universities and research centers as of late 2006/early 2007 (DFG, 2016), or the abolition of the professor’s privilege in 200224 (Von Proff, Buenstorf and Hummel, 2012) do not invalidate the identifying assumption. The introduction of the “Junior Professorship” in 2002 as an alternative path to professorships from the habilitation, cannot be driving the earlier results either, because the first Junior Professors become eligible for a tenured position by 2008/9 at the earliest (Lutter and Schröder, 2016). The three-year window of the treatment and control cohorts therefore does not include first-time tenured professors who completed a Junior Professorship. Nonetheless, I test for pre-existing trends to provide further assurance that other events are not driving the baseline results. To do so, I estimate the following model as a conditional QML Poisson fixed effects model:

h i 15 7 E Yi, f ,t−x f |Xi, f ,t = exp[αi + ∑ βkttti,k−8 ∗ Treatmenti + ∑ ttti, j + γt] (5) k=1 j=−7

Here, ttti,k−8 ∗Treatmenti denote interactions of 15 time-to-tenure dummies (from 7 years before to 7 years after the start of the first tenured affiliation) with the treatment variable. All other variables are as before. This specification effectively aligns the relative time (time-to-tenure) for different tenure cohorts, and allows me to estimate the differences in outputs of the treated and control cohorts as they move towards and beyond the start of their first tenured affiliation. The

24Furthermore, under the professor’s privilege regime, professors own the IPR of their inventions (Hvide and Jones, 2018). The abolition of this privilege should reduce incentives to produce commercializable (patentable) knowledge.

20 coefficient estimates and 95% confidence intervals of the interaction terms are depicted in Figure 2 for the baseline dependent variables. The green vertical dashed line at t − 5 indicates where in the tenure trajectory of the youngest academics of the treated cohort, who start their first tenured affiliation in 2007, the announcement of the pay reform occurs. The orange vertical dashed line at tenure marks the time at which treated academics enter into the performance pay scheme. All five figures display a similar pattern: the interaction terms, which capture the difference in the respective output measures between the treated and control cohort, show that these cohorts start to diverge after the announcement of the reform, when the treated cohort faces higher-powered incentives. Recall further that the publications have been backdated by the average publication lags in the respective academic fields (rounded up to the nearest year), so the differential increase in number of publications directly after the announcement of the reform is consistent with an immediate effort response to the higher-powered incentives it heralds. The response in quality-adjusted output and average quality occurs a bit later and more gradually, as expected if producing high quality research takes more time and is riskier. The absence of pre-existing trends and clear alignment of the productivity response with the timing of the announcement of the pay reform, and hence the moment when the treated cohort starts to face higher-powered incentives, lends support to the interpretation of the differential productivity response as the causal effort effect of the performance pay reform and not the effect of another event. The analysis also underlines that the effort response to the higher-powered incentives is not a temporary response, but one that persists. The absence of pre-existing trends and the persistence of the effort effect also rule out that the effect of tenure itself is driving the results. Recall also that the time-to-tenure dummies in the baseline specification control for any common productivity changes in the run-up to and following the start of first-time tenured positions. The causal interpretation of the baseline results also requires that the timing of tenure is exogenous. In particular, academics could try to get a tenured affiliation sooner after they learn of the impending reform in order to avoid the performance pay system25. The absence of pre-existing trends for the treated and control cohorts lends support to the exogeneity of tenure timing and a placebo pre-trends regression as in equation 5 lends further support. For this placebo test, I restrict the sample to academics who start their first tenured position in 2001 to 2004 and use the cohort that starts their first tenured position at a public university in 2001 or 2002 as placebo control group and those who start their first tenured position in 2003 or 2004 as placebo treatment group. Figure 3 shows that the interaction terms are generally close to 0 and not significant and there is no evidence of any consistent differential trends. If academics in the placebo treatment group had been able to avoid entry into the performance pay scheme by temporarily stepping up their efforts to obtain a tenured affiliation sooner, the productivity differential between the placebo treatment and control cohorts would have been positive between

25Note that attempts to delay the start of a tenured position would not be rational and are therefore not much of a concern, since an academic would delay earning the higher pay associated with a tenured position, while they can always opt into the performance pay scheme after 2005 if so preferred.

21 the announcement of the reform and the start of the first tenured affiliation, and possibly negative thereafter. The absence of such a pattern thus lends support to the identifying assumption of the exogeneity of the timing of the start of the first tenured affiliation. . As a futher test, I show that the main results go through with synthetic treatment and control cohorts, which are defined by the average age at which academics start their first tenured affiliation rather than the actual timing of the first tenured affiliation (Table A.2 Panel D and Appendix Section A4.1) As a final validity test, I estimate the tenure probability using hazard rate analysis in Appendix Section A4.3. I find no evidence that the requirements for obtaining a tenured position increase with the reform. There is thus no evidence that the cohort of academics who start their first tenured position after the reform are more productive than academics in the control cohort, so this is not driving the results either.

4.2.6 Heterogeneous responses by academic field

The baseline results show how performance pay affects research productivity on average. This section delves into the anatomy of the effort response and tests if and how the effort response differs by academic field, while the next section estimates responses by productivity quantile. Effort responses may differ by field for a number of reasons. For one, research teams in the natural and applied sciences tend to be much larger than those in the social sciences and humanities26. The response to higher-powered incentives may be larger in smaller teams, if the likelihood that all or most team members are highly incentivized is larger in smaller teams. On the other hand, in large teams the benefit of good management may have stronger effects, so that highly incentivized team leaders may be able to bring about larger changes in output. Furthermore, fields may differ in the noise in output quality measures. To the extent that there are more objective quality measures in the natural and applied sciences than in the social sciences and humanities, quality measures may be more noisy in the latter fields. The theoretical model presented above shows that greater quality measure noise is associated with a quality effort response that is smaller positive or larger negative. To study heterogeneous effort responses by academic field, I estimate the baseline regression separately by broad academic field: natural and applied sciences, and social sciences and humanities. I classify mathematics, physics and informatics, biology, chemistry, earth sciences, pharmacology, engineering, medicine, dentistry, veterinary, agricultural science and nutrition science as natural and applied sciences; and theology, philosophy and history, philology and anthropology, law, economics and other social sciences as social sciences and humanities. Figure 4 depicts the estimation results of the baseline regression for these broad fields separately. Quantity effort increases in both broad fields in response to performance pay, by 17% in the natural and applied sciences and 31% in the social sciences and humanities. The difference be- tween these effort responses is not statistically significant in a pooled regression with interaction

26As gauged by the average number of authors on a paper in a field (calculated over the pre-reform years 1996-2000).

22 terms with broad field indicators (Table A.4). Yet, where quality-adjusted measures of output increase significantly in the natural and applied sciences, there is no significant increase in these same measures in the social sciences and humanities and the average impact factor-rating of publications decreases less in the former, though this difference is only marginally significant. That is, while I find no evidence of a significant difference in the level of the quanity effort response, there is some evidence suggesting the quality effort response may be more negative in the social sciences and humanities, which would align with quality measures being more noisy in this field.

4.2.7 Heterogeneous responses by productivity quantile

The theoretical model predicts that the quantity effort response decreases with ability type, with a potential uptick for the highest ability levels if the quality signal is not too noisy. Quality effort on the other hand is expected to increase the most in high ability agents or, if the quality signal is very noisy, decrease the least. Low ability agents are expected to decrease quality effort the most or, if they already exert the minimum possible level of quality effort, not change quality. I test these hypotheses in turn below. I determine productivity quantiles separately by academic field and treatment group on the basis of the averages of the impact factor-rated number of publications published in 1999, 2000 and 2001, using pre-announcement averages to avoid simultaneity bias. Because the productivity distributions are highly right-skewed, with the median academic not publishing a paper or receiving any citations in an average year (cf. Table 1), I look at above-median academics separately by decile and below-median academics as one group. The histograms in Panel A 0 through F of Figure 5 depict the Post 02∗Treatmenti (grey bars) and Tenure∗Treatmenti (white bars) coefficient estimates and 95% confidence intervals of baseline regressions run separately for academics in the top five deciles and those below the median. Both low and high productivity academics increase pure quantity and quality-adjusted quantity in response to performance pay. There is a 24% increase in the number of publications and a 31% increase in impact factor-rated number of publications in response to career concerns incentives in the below-median treatment group relative to the same quantile in the control group. The top decile and 7th decile also increase the number of publications, by 22% (Top 10%) and 24% respectively, as well as the sum of citations to publications, by 32% and 39%. There is no significant response in the 9th, 8th and 6th decile, though the lack of statistical significance in the 6th decile is likely due to the small decile size because of the aforementioned skewness in the distribution of the productivity variables. This quantity effort response constitutes a positive and significant intensive margin response for top decile academics only (Fig A.2). Conditional on having at least one publication in a given year, their number of publications increases by 20% and citations to publications by 30%. In contrast, the effort response of lower deciles is solely an extensive margin response. The probability that an academic has at least one publication in a given year increases for

23 below-median academics as well as in all but the highest two above-median deciles (Fig A.3) The separate quantile regressions compare responses in the same quantile of the treatment and control group but cannot test whether the effort response differs across quantiles. To test the latter, I estimate the baseline regression model augmented with interaction terms with indicator variables for the top five deciles. Table 3 shows that the differences in effort responses are, in fact, significant. The coefficient of the post002 interaction with the Treatment variable in column 1 implies that the below-median productivity academics of the treatment group produce 32% more publications in response to career concern incentives, relative to the same quantile in the control group. The triple interactions of post002, Treatment and indicator variables for the 10th and 9th decile imply that the effort response of these deciles of the treated group are, respectively, 14% and 28% less than the effort response of below-median academics (significant at the 10% and 1% level, respectively). Moreover, simple Wald tests for equality of the post002 ∗ Treatment ∗ decile interactions show that the 9th decile interaction is significantly different from all other interactions, so the quantity effort response of the sub-top productivity decile is less positive than that of both higher and lower productivity quantiles. A similar pattern emerges for the impact factor-rated number of publications, and for the sum of citations the effort response in both the 8th and 9th decile is significantly less positive than in all other quantiles. Results are robust to using deciles based on pre-announcement averages of the sum of citations, as well as to excluding academics who switch to performance pay after 2005 (Table A.5), so there is no evidence that differential selection into performance pay is driving these heterogeneous treatment effects. Taken together, these results show that the pure quantity and quality-adjusted quantity effort response is largest for relatively low productivity (below-median) academics and smallest for academics just below the top of the productivity distribution. Turning to the quality effort response, I find that academics in the sub-top productivity deciles decrease quality effort, while quality remains unchanged in lower and higher productivity quantiles. Figure 5 shows that the average number of citations decreases significantly in the 8th decile, and the intensive margin response in the sum of citations is significantly negative for the 9th decile (Fig A.3), but there is no significant change in either metric in other deciles. The response histograms in Figure Panel B show that both these sub-top productivity deciles produce fewer papers in the top citation decile, a reduction of 25% and 32% respectively, thus the quality of their work decreases. There is no evidence that other productivity deciles change quality effort. There are increases in the lowest citation quartile bin for below-median productive academics and in higher citation bins for the most productive academics, in line with their quantity effort response and commensurate with their ability class. But since there is no sign of substitution of higher citation decile bin papers for lower decile bins or vice versa, there is no evidence of changes in output quality for these productivity classes. These findings align with the theoretical predictions for a setting in which the quality signal is sufficiently noisy so that even the highest productivity agents do not increase quality effort, and the lowest productivity classes cannot reduce quality as they

24 already exert the minimum required level, leaving a decrease in quality effort for intermediate productivity levels only.

4.2.8 Novelty and Impact

To provide further evidence on the quality and impact of the work produced in response to performance incentives, I perform textual analysis of paper abstracts. Specifically, I calculate cosine similarity measures of the Term Frequency Inverse Document Frequency vectors of publications and comparison papers, as in Kelly et al. (forthcoming). The Term Frequency Inverse Document Frequency (TFIDF) index is defined in the following way:     cw,d cd,s 0 where cw,d denotes the count of a term w in document d (and equivalently for cl,d) and cd,τ denotes the count of documents that contain the term w and that were published in period τ. The first element of the expression is then the frequency of term w in document d (the “Term Frequency” part), while the second element is the log of the inverse of the frequency of documents in which term w appears (the “Inverse Document Frequency” part). A term that occurs more frequently in a document has a higher TFIDF, while more commonly used terms have a smaller TFIDF, because the second element is smaller. As in Kelly et al. (forthcoming), I use the “backward” TFIDF; that is, I use only publications published in the years before publication of the focal document to calculate the inverse document frequency. I calculate TFIDFs for the abstracts of all publications in the data set at the bigram (word pair) basis, rather than word by word.27 Before calculating the TFIDFs, I drop common stop words from abstracts (See Table A.10 in the Appendix for a list of the stop words that were removed). In order to measure the similarity of the bi-grams used in the focal publication, relative to a set of comparison publications, I calculate the cosine similarity of the normalized vector of TFIDFs of the focal document and a comparison publication. Formally, for focal document d published in year t, the cosine similarity with comparison publication d˜ published in year t˜ is:

  ! TFIDFd,t TFIDFd˜,t˜ ρd,t;d˜,t˜ = . |TFIDFd,t| |TFIDFd˜,t˜| For each paper pair, I calculate the inverse document frequency based on the set of all papers in the same field published in all years prior to the earlier of the two publication dates (t,t˜)28. If the focal publication and comparison publication have abstracts that have no bigrams in common, the cosine similarity is 0. The more common bigrams in the focal publication and comparison

27As an illustration, the following sentence, ’Paul walks home’, has two bigrams: ’Paul walks’ and ’walks home’. 28Note that, due to the way in which publication records were extracted from ISI Web of Science, namely, filtering by publications that have at least one author with a German (work) address, the set of comparison papers are not just from the same field, but also from authors in the same country.

25 publication, the closer the cosine similarity is to 1. As in Kelly et al. (forthcoming), I calculate two different metrics derived from these cosine similarities: backward similarity and forward similarity. The backward similarity of focal publication d published in t is calculated as the sum of the cosine similarities between the focal document and comparison publications published in a three year window before the focal document was published, while the forward similarity is calculated as the sum of the cosine similarities between the focal publication and comparison publications published in a three year window after the focal document was published. The more bigrams in the abstract of the focal publication that have not been used in abstracts of publications published earlier, the smaller the backward similarity metric. This might point to such publications being more novel. The forward similarity metric on the other hand captures the relatedness of follow-on research and, as such, constitutes an alternative measure of impact to citations. Kelly et al. (forthcoming) find that patents with a higher ratio of forward to backward similarity are more likely highly cited and have higher market value. This lends support to the notion that the TFIDF cosine similarity measures capture novelty and impact. Iaria, Schwarz and Waldinger (2018) employ Latent Semantic Analysis (LSA) of abstracts to generate measures of similarity of papers. LSA is a machine learning technique that takes into account whether words are commonly used in similar contexts. As such, it classifies titles with words that often occur in the same context as similar, even if the words are different. Because the TFIDF metric does not ‘learn’ about similar topics, it is more likely to understate similarity and the cosine similarity measures based on it are more likely to overstate novelty and understate the relatedness of follow- on research. I use metrics based on the TFIDF here because we may be particularly concerned about potential decreases in the novelty of knowledge work in response to performance pay and the TFIDF metrics provide conservative estimates of any decreases in novelty. Because the analysis in this paper is at the individual academic level rather than at the publication or patent level, as is the case in (Kelly et al., forthcoming), and because publications are matched to academics on the basis of i.a. name and field, I deviate from the metrics in the latter paper in two ways. First, I restrict the set of comparison publications to the same field as the focal publication. Second, for each similarity measure, I calculate quantile frequency bins in the same way as for citations above, so as to analyze the effect of performance pay on the distribution of the similarity of papers to past and future work. The novelty metric analyses align with the earlier quality response results; the additional work produced in to response performance pay is, on average, not the most novel or impactful. Panel A in Figure 7 shows that there is a significant increase in the frequency of top quartile backward cosine similarity papers as well as a marginally significant increase in the bottom decile and quartile bins of the same variable. That is, in response to performance pay, more papers that are very similar to previously published papers are produced, though there is also a slight increase in papers that are very dissimilar to prior work. At the same time, there is an increase in papers in the third quartile bin of forward similarity metrics, and hence in papers that give rise to

26 considerable, but not the most, follow-on research or - which would be indistinguishable - are part of a burgeoning literature (Figure 7 Panel B). Breaking down the novelty metric analysis by productivity quantiles provides further insight into the earlier heterogeneous quality response results. Top productivity academics produce more medium to highly novel work that garners mid- to high-level follow-on, low productivity academics produce additional papers that are just above the median in terms of both similarity to past and future work, while sub-top productivity academics produce more papers that are very novel, but also garner only very little follow-on work (Figures A.4 and A.5).

4.3 Selection Effect

The empirical analysis has so far focused on the effort effect of performance pay. This is only one channel through which performance pay can increase output. I now turn attention to another important channel: selection. To this end, I study which academics sort into performance pay, by switching from the old age-related pay scheme to the new performance pay scheme. As shown in the theoretical section, selection into academia should follow the same pattern as selection into performance pay as the latter drives the former.

4.3.1 Non-Parametric Analysis

As a first test, I analyze the hazard and survival rates of switches to performance pay of academics with a tenured affiliation at a public university before the reform. These academics have the choice (i.e. are “at risk”) of switching pay scheme because they are initially paid according to the age-related pay scheme. They can select into the performance pay scheme by changing affiliation or position, or renegotiating their contract. Accordingly, I label any first affiliation, position or contract change29 after implementation of the pay reform (as of 2005) as a switch to performance pay (the “failure” event) and use the time until such a switch as duration variable. I count the time it takes for academics to switch from their most recent affiliation, position or contract change before the reform implementation (the at risk duration) if they are tenured, at a public university and not retired. I restrict attention to academics who start their first tenured affiliation after 1998, so that I observe their full tenured affiliation history and hence the moment they become “at risk” of switching30’31. There are 11237 such academics and I observe 1231 switches in a total of 85716 periods (years) during which these academics can switch from the

29I assume that, whenever an academic receives an offer, they either accept and change position, or reject and renegotiate their current contract. In either case, the academic switches to the new performance pay scheme if the change or renegotiation happens after the reform. If there are academics who do not at least renegotiate their contract when they receive an offer, these academics are more likely to be of a lower productivity type, and including them in the pool of switchers would reduce the estimate of the selection effect I find. Preissler (2006) reports that only a small number of professors chose to opt into the W-pay scheme without another/outside option. 30As in the effort effect analysis, I also restrict the sample to academics that do not have a foreign affiliation before their first tenured affiliation, since I do not have full affiliation and publication records for academics from abroad. 31Including academics who enter into the data set in a tenured position would introduce left truncation into the survival analysis sample. Some academics who enter the dataset in a tenured position change contracts before the reform is implemented. For those academics I know when they become at risk of switching to the performance pay. For others, who do not change contract before the reform implementation, I would not know their risk origin date and hence their risk duration. The former academics may well be systematically different from the latter, which would bias the analysis.

27 age-related to the performance pay system. The average incidence rate of switches is 0.014. Hence, on average, academics in this group have a 1.4% chance of switching to performance pay in any year after 2004. Figure 8 shows the Epanechnikov kernel density estimates of the hazard function for switches from age-related to performance pay for academics whose average productivity falls in the top quartile or bottom three quartiles of the average productivity distribution. I base productivity quartiles on the average of the impact factor-rated number of publications in the three pre- implementation years (2002, 2003 and 2004), calculating quartiles separately by academic field and broad tenure group (tenured before or after the reform). The hazard rate for switching to the performance pay scheme is clearly greater for top quartile academics throughout, so higher productivity academics are more likely to sort into the performance pay scheme. A log-rank test of the equality of the survival functions of top quartile academics and bottom three quartile academics rejects the equality of the survival functions at the 1% significance level32. Results are robust to using quartiles based on the sum of citations or impact factor-rated number of publications weighted by number of authors (Figure A.6).

4.3.2 Difference-in-Differences Estimation

In order to distinguish any general switching patterns from the selection effect of performance pay, I estimate the selection effect of the introduction of performance pay parametrically in a difference-in-differences framework. Here I exploit variation along two dimension; an academic’s tenure cohort, and age. Academics who start their first tenured position before 2005 fall under the age-related pay scheme when the reform is implemented. They switch to performance pay when they change their affiliation, position or contract after the reform takes effect in 2005. In contrast, academics who start their first tenured position after the reform implementation automatically fall under the performance pay scheme from the moment they make tenure and thus cannot switch to performance pay. Hence the cohort of academics who start their first tenured position before 2005 is the treated cohort here, while those that make tenure after 2004 make up the control cohort. Due to the single-crossing property of the basic wage schemes for age-related and performance pay (Cf. Figure A.1), selection incentives are weaker for older academics. In order to make switching worth their while, academics need to be able to more than make up for the difference in basic wage. Because this difference is larger for older academics, they will need to be of higher ability in order to want to switch. That is, compared to academics in the control group, the risk of switching increases relatively less with productivity for older academics in the treated cohort.

32The log-rank test returns a Chi-squared statistic of 8.14 (p-value 0.0043)

28 I estimate the following Weibull proportional hazard model to test this:

λi,t = ρ ∗exp[β0 +β1Treati +β1Agei,t +β2AvgProdi +β3Agei,t ∗Treati +β4AvgProdi ∗Treati ρ−1 + β1AvgProdi ∗ Agei,t + β2AvgProdi ∗ Agei,t ∗ Treati + Xi + ui,t] ∗t (6)

Treat is a dummy variable that is 1 for academics who started a first tenured affiliation before 2005, and 0 for those whose first tenured affiliation started as of 2005. AvgProd is an academic’s average productivity calculated as three-year pre-implementation averages (2002-2004) of the impact factor-rated number of publications. The age variable is equal to an author’s self-reported age if known, and equal to a synthetic age otherwise. I calculate synthetic ages using the average age at habilitation, promotion or tenure. All models control for academic field fixed effects33 and are estimated for years t > 2004 and for academics who start their first tenured position after 1998 and do not hold a foreign position immediately prior. My preferred specification also controls for synthetic age at the start of the first tenured affiliation . Standard errors are robust and clustered by individual academic. The a and b columns in Panel A of Table 4 report estimation results of specifications without and with age-at-tenure as additional control, respectively. The positive and significant coefficient estimates of the interaction AvgProd ∗Treat in column 1 imply that a one standard deviation increase in the average productivity of treated academics increases the rate at which they switch to performance pay, by 10.8% to 20%34 more than academics in the control group. This holds while controlling for the interaction term Age ∗ Treat, which, as expected, is negative. Column 2 shows that, while a one standard deviation increase in average productivity reduces the negative effect of age on switching rates by 2.6% on average, this moderating effect of productivity is 2.6% less for treated academics (compare the coefficients of AvgProdi ∗ Agei,t and AvgProdi ∗ Agei,t ∗ Treati). That is to say, treated academics need to be of even higher productivity in order to switch. The selection effect of performance pay, net of general differential sorting patterns by age and productivity types, is thus positive and significant. The finding that performance pay attracts more productive academics is robust to estimating the model as a Cox proportional hazard model (Table A.6 Panel B) or estimating the Weibull model with academic field strata (Table A.6 Panel C)35. Replacing the AvgProd variable in the baseline Weibull model (6) by a dummy indicating whether an academic’s pre-reform average productivity is above median shows that having above median productivity reduces the negative effect on switching rates of an extra year of age by 5.8% on average (Column 1c Table A.6 Panel A). However, this moderating effect of above median productivity is 3.8% less in treated academics, so that treated academics need to be even more productive to switch.

33These fields are: theology; philosophy and history; social sciences; philology and cultural studies; law; economics; mathematics, physics and computer science; biology, chemistry, earth sciences and pharmaceutics; engineering; agricultural sciences, nutrition and veterinary medicine; medicine (human); dentistry. 34Calculated as EXP(0.004*25.53797)-1, where 25.53797 is the standard deviation of the average productivity variable here. 35Results are robust to basing the average productivity variable on other productivity measures as well. Results are available from the author on request.

29 4.3.3 Validity Checks

To assess the validity of the selection effect estimation, I run placebo estimations of the afore- mentioned models. Academics who start their first tenured position before 2002 are defined as placebo treatment group here, while academics who start their first tenured affiliation between 2002 and 2005 act as control group. Both groups switch into the performance pay scheme when they change affiliation, position or contract after 2004, so they face the same selection incentives. The estimation results of the baseline placebo estimation are reported in Panel B of Table 4 with robustness checks in columns 2a, 2b and 2c Table A.6 panel A. Reassuringly, neither the

AvgProd ∗Treat interaction nor the AvgProdi ∗Agei,t ∗Treati triple interaction is ever significant, so there is no evidence that productive academics in the placebo treatment group are more likely to select into performance pay than academics in the placebo control group. As a final check, I also estimate any changes in affiliation, position or contract switching rates from before to after the implementation of the reform for academics in the treated cohort (those whose first tenured affiliation started before 2005). Table A.7 shows, that, while a one standard deviation increase in average productivity increases the likelihood of a switch by around 5.5% on average, this increase in the likelihood of a switch grows to 8 or 9% after the implementation of the reform. This consolidates the finding of a positive selection effect of performance pay.

5 CONCLUSION

This paper shows that performance incentives in knowledge creation give rise to greater output quantity, but not more of the highest quality output. The theoretical model presented in this paper shows that this is what we would expect to see if output quality measures are noisy. This is found despite the performance incentives studied including implicit, career concerns incentives which potentially allow for performance assessment over a longer time horizon, and therefore a less noisy measure of quality. Even then, in the absence of readily available, precise measures of quality such as for novelty or impact, principals may still have to resort to noisier signals of knowledge production quality. It would be valuable therefore, if more informative measures of novelty, impact or other quality dimensions became available. This paper employs one potential measure, and with the advent of ever more powerful machine learning algorithms, the availability and precision of relevant performance metrics for knowledge creation could likely be improved. This seems a worthwhile avenue for research and development, not just for academic research, but knowledge output more generally. Another multi-tasking issue pertains to the different dimensions of academic jobs in particular. If performance in the realm of research is easier to measure than educational performance for instance, or if more weight is given to research output metrics, academics may have shifted effort away from teaching to focus more on research, even if incentivized on all dimensions.. It would be worthwhile to assess if and how performance in teaching and promotion of young scholars changes in response to performance pay. This is left for future research.

30 The paper also shows that more productive academics are more likely to sort into higher- powered incentives, so I do not find evidence of crowding out in this regard (Benabou and Tirole, 2003). The theoretical model shows that selection into academia should follow the same pattern to the extent this is driven by performance incentives. There may be other factors that drive selection into academia or other knowledge jobs, such as risk preferences or differential opportunity costs of the generally long training trajectories required for knowledge work. More research into selection into knowledge creation would therefore be worthwhile. Academic research is an important instance of knowledge work, and understanding the effect of performance pay on both effort and selection in this particular context is valuable in its own right. Academia is also an interesting and useful setting in which to study the organization of knowledge work more generally and as such, the findings in this paper have implications for knowledge work in other contexts. Academia does however have a number of characteristics that may not be present to the same degree in other contexts, and this has implications for the extent to which the findings in this paper carry over to other settings. The knowledge created in academia is highly visible and available to a broad audience, and academics are (expected to be) highly mobile. Both of these characteristics are conducive to career concerns. The introduction of performance pay, specifically the implicit, career concerns incentives introduced with the reform studied in this paper, may therefore not have as strong an effect in sectors in which these conditions are not met. In those areas, principals may have to resort to explicit, on-the-job performance incentives more. These were however not found to give rise to an additional significant effort response in academia. In industries in which knowledge is confidential, it may therefore be worthwhile to contemplate publicizing (some of) the knowledge generated as a means to motivating workers and more research into the publicness of output and reputation concerns would be valuable.

References

Abraham, Sarah, and Liyang Sun. 2018. “Estimating Dynamic Treatment Effects in Event Studies with Hetero- geneous Treatment Effects.” Arxiv Preprint Arxiv:1804.05785. Academics.de. 2016. “Tenure Track: Professor Auf Lebenszeit.” Aghion, Philippe, , Caroline Hoxby, Andreu Mas-Colell, and André Sapir. 2010. “The Governance and Performance of Universities: Evidence from Europe and the Us.” Economic Policy, 25(61): 7–59. Aghion, Philippe, Peter Howitt, and David Mayer-Foulkes. 2005. “The Effect of Financial Development on Convergence: Theory and Evidence.” The Quarterly Journal of Economics, 120(1): 173–222. Audretsch, David B, and Maryann P Feldman. 1996. “R&d Spillovers and the Geography of Innovation and Production.” The American Economic Review, 630–640. Autor, David H. 2019. “Work of the Past, Work of the Future.” Vol. 109, 1–32. Azoulay, Pierre, Jeffrey L Furman, Joshua L Krieger, and Fiona Murray. 2015. “Retractions.” Review of Economics and Statistics, 97(5): 1118–1136. Azoulay, Pierre, Joshua S Graff Zivin, and Gustavo Manso. 2011. “Incentives and Creativity: Evidence from the Academic Life Sciences.” The Rand Journal of Economics, 42(3): 527–554.

31 Azoulay, Pierre, Waverly Ding, and Toby Stuart. 2009. “The Impact of Academic Patenting on the Rate, Quality and Direction of (public) Research Output.” The Journal of Industrial Economics, 57(4): 637–676. Bandiera, Oriana, Iwan Barankay, and . 2005. “Social Preferences and the Response to Incentives: Evidence from Personnel Data.” The Quarterly Journal of Economics, 917–962. Benabou, Roland, and . 2003. “Intrinsic and Extrinsic Motivation.” The Review of Economic Studies, 70(3): 489–520. Bénabou, Roland, and Jean Tirole. 2006. “Incentives and Prosocial Behavior.” American Economic Review, 96(5): 1652–1678. Bergsdorf, Wolfgang. 2005. “Richtlinie Der Universitaet Erfurt Ueber Das Verfahren Und Die Vergabe Von Leistungsbezuegen.” Besley, Timothy, and Maitreesh Ghatak. 2005. “Competition and Incentives with Motivated Agents.” American Economic Review, 95(3): 616–636. Besley, Timothy, and Maitreesh Ghatak. 2018. “Prosocial motivation and incentives.” Annual Review of Eco- nomics, 10: 411–438. Biester, Christoph. 2010. “Der Universitaere Metabolismus, Die Buerokratisierung Der Leistungsorientierten Verguetung in Der W-besoldung.” Online powerpoint, Accessed on 6 May 2015. Björk, Bo-Christer, and David Solomon. 2013. “The Publishing Delay in Scholarly Peer-reviewed Journals.” Journal of Informetrics, 7(4): 914–923. BMBF. 2002. “Gesetz Zur Reform Der Professorenbesoldung.” ProfBesReformG. BMI. 2007. “Bericht Zum Besoldungsrechtlichen Vergaberahmen Bei Der Professorenbesoldung Nach § 35 Abs. 5 Bundesbesoldungsgesetz.” Bundesministerium des Innern. Stand 29 February 2008. Boly, Amadou. 2011. “On the Incentive Effects of Monitoring: Evidence from the Lab and the Field.” Experimental Economics, 14(2): 241–253. Bonatti, Alessandro, and Johannes Hörner. 2017. “Career concerns with exponential learning.” Theoretical Economics, 12(1): 425–475. Borjas, George J, and Kirk B Doran. 2015. “Prizes and Productivity How Winning the Fields Medal Affects Scientific Output.” Journal of Human Resources, 50(3): 728–758. Borusyak, Kirill, and Xavier Jaravel. 2017. “Revisiting Event Study Designs.” Available at Ssrn 2826228. Boudreau, Kevin J, Karim R Lakhani, and Michael Menietti. 2016. “Performance Responses to Competition across Skill Levels in Rank-order Tournaments: Field Evidence and Implications for Tournament Design.” The Rand Journal of Economics, 47(1): 140–165. Boudreau, Kevin J, Nicola Lacetera, and Karim R Lakhani. 2011. “Incentives and Problem Uncertainty in Innovation Contests: An Empirical Analysis.” Management Science, 57(5): 843–863. Bundesgesetzblatt. 1985. “Bundesbeamtengesetz.” Carpenter, Jeffrey, Peter Hans Matthews, and John Schirm. 2010. “Tournaments and Office Politics: Evidence from a Real Effort Experiment.” The American Economic Review, 100(1): 504–517. Chan, Ho Fai, Bruno S Frey, Jana Gallus, and Benno Torgler. 2014. “Academic Honors and Performance.” Labour Economics, 31: 188–204. Chevalier, Judith, and Glenn Ellison. 1999. “Career concerns of mutual fund managers.” The Quarterly Journal of Economics, 114(2): 389–432. Clarivate Analytics. 1993-2012a. “ISI Web of Science.” Clarivate Analytics. 2000-2012b. “Journal Citation Report.” Clarivate Analytics. 2017. “Journal Citation Report.” De Groot, Morris H. 1970. Optimal Statistical Decisions. McGraw-Hill. De Gruyter. 2006. “Kuerschners Deutscher Gelehrten Kalender.” cd-rom. De Gruyter. 2008. “Kuerschners Deutscher Gelehrten Kalender.” cd-rom. De Gruyter. 2013. “Kuerschners Deutscher Gelehrten Kalender Online.”

32 Detmer, Hubert, and Ulrike Preissler. 2004. “Abenteuer W, Strategien, Risiken Und Chancen.” Forschung Und Lehre, , (6): 308–311. Detmer, Hubert, and Ulrike Preissler. 2005. “Die Neue Professorenbesoldung, Ein Ueberblick.” Forschung Und Lehre, , (5): 256–258. Detmer, Hubert, and Ulrike Preissler. 2006. “Die W-besoldung Und Ihre Anwendung in Den Bundeslaendern.” Beitraege Zur Hochschulforschung, 28(2): 50–65. Dewatripont, Mathias, Ian Jewitt, and Jean Tirole. 1999. “The economics of career concerns, part II: Application to missions and accountability of government agencies.” The Review of Economic Studies, 66(1): 199–217. DFG. 2016. “Excellence Initiative 2005-2017.” DHV. 1999-2013. “Forschung Und Lehre.” DHV. 2002. “Habilitationen Und Berufungen.” Forschung Und Lehre, 41. DHV. 2014. “Forschung und Lehre - Wir Ueber Uns.” Dickinson, David, and Marie-Claire Villeval. 2008. “Does Monitoring Decrease Work Effort?: The Complemen- tarity between Agency and Crowding-out Theories.” Games and Economic Behavior, 63(1): 56–76. Dilger, Alexander. 2013. “Vor- Und Nachteile Der W-besoldung.” Discussion Paper of the Institute for Organisa- tional Economics. Westfaelische Wilhelms-Universitaet Muenster. Dohmen, Thomas, and . 2011. “Performance Pay and Multidimensional Sorting: Productivity, Prefer- ences, and Gender.” The American Economic Review, 556–590. Ederer, Florian, and Gustavo Manso. 2013. “Is Pay for Performance Detrimental to Innovation?” Management Science, 59(7): 1496–1513. Erat, Sanjiv, and Uri Gneezy. 2016. “Incentives for Creativity.” Experimental Economics, 19(2): 269–280. Expertenkommission. 2000. “Bericht Der Expertenkommission - Reform Des Hochschuldienstrechts.” Ferrer, Rosa. 2016. “The effect of lawyers’ career concerns on litigation.” Fitzenberger, Bernd, and Ute Schulze. 2014. “Up or Out: Research Incentives and Career Prospects of Postdocs in Germany.” German Economic Review, 15(2): 287–328. Freeman, Richard B, and Alexander M Gelber. 2010. “Prize Structure and Information in Tournaments: Experi- mental Evidence.” American Economic Journal: Applied Economics, 2(1): 149–164. Gibbons, Robert, and Kevin J Murphy. 1992. “Optimal Incentive Contracts in the Presence of Career Concerns: Theory and Evidence.” Journal of Political Economy, 100(3): 468–505. Gibbs, Michael, Susanne Neckermann, and Christoph Siemroth. 2017. “A Field Experiment in Motivating Employee Ideas.” Review of Economics and Statistics, 99(4): 577–590. Gien, Gabriele. 2017. “Satzung Der Katholischen Universitaet Eichstaett-ingolstadt Zur Regelung Des Verfahrens Der Bewertung Der Besonderen Leistungen Zur Vergabe Der Besonderen Leistungsbezuege.” Goodman-Bacon, Andrew. 2018. “Difference-in-differences with Variation in Treatment Timing.” National Bureau of Economic Research. Gourieroux, Christian, Alain Monfort, and Alain Trognon. 1984. “Pseudo Maximum Likelihood Methods: Applications to Poisson Models.” Econometrica: Journal of the Econometric Society, 701–720. Gross, Daniel P. 2016. “Creativity under Fire: The Effects of Competition on Creative Production.” Review of Economics and Statistics, 1–49. Haeck, Catherine, and Frank Verboven. 2012. “The Internal Economics of a University: Evidence from Personnel Data.” Journal of Labor Economics, 30(3): 591–626. Hall, Bronwyn H, and Dietmar Harhoff. 2012. “Recent Research on the Economics of Patents.” Annu. Rev. Econ., 4(1): 541–565. Handel, Kai Christian. 2005. Die Umsetzung Der Professorenbesoldungsreform in Den Bundesländern. CHE. Harbring, Christine, Bernd Irlenbusch, and Matthias Kräkel. 2004. “Ökonomische Analyse Der Profes- sorenbesoldungsreform in Deutschland.” 197–219. Hellmann, Thomas, and Veikko Thiele. 2011. “Incentives and Innovation: A Multitasking Approach.” American

33 Economic Journal: Microeconomics, 3(1): 78–128. Hochschullehrerbund. 2009. “Die Professorengehaelter in Der W-besoldung, Art Und Umfang Von Berufungsver- handlungern.” Online. Holmström, Bengt. 1982. “Managerial Incentives Schemes-a Dynamic Perspective.” Essays in Economics and Management in Honor of Lars Wahlbeck. Holmström, Bengt. 1999. “Managerial Incentive Problems: A Dynamic Perspective.” The Review of Economic Studies, 66(1): 169–182. Holmstrom, Bengt, and Paul Milgrom. 1991. “Multitask principal-agent analyses: Incentive contracts, asset ownership, and job design.” JL Econ. & Org., 7: 24. Hong, Harrison, Jeffrey D Kubik, and Amit Solomon. 2000. “Security analysts’ career concerns and herding of earnings forecasts.” The Rand journal of economics, 121–144. Hossain, Tanjim, and John A List. 2012. “The Behavioralist Visits the Factory: Increasing Productivity Using Simple Framing Manipulations.” Management Science, 58(12): 2151–2167. HRK. 2014. “Hochschulkompass.” Huber, Bernd. 2005. “Richtlinien Der Ludwig-maximilians-universitaet Muenchen Zur Regelung Der Grundsaetze Fuer Die Vergabe Von Leistungsbezuegen.” Hvide, Hans K, and Benjamin F Jones. 2018. “University Innovation and the Professor’s Privilege.” American Economic Review, 108(7): 1860–98. Iaria, Alessandro, Carlo Schwarz, and Fabian Waldinger. 2018. “Frontier Knowledge and Scientific Production: Evidence from the Collapse of International Science.” The Quarterly Journal of Economics, 133(2): 927–991. Jaffe, Adam B, Manuel Trajtenberg, and Rebecca Henderson. 1993. “Geographic Localization of Knowledge Spillovers As Evidenced by Patent Citations.” The Quarterly Journal of Economics, 108(3): 577–598. Jones, Benjamin F. 2009. “The Burden of Knowledge and the Death of the Renaissance Man: Is Innovation Getting Harder?” The Review of Economic Studies, 76(1): 283–317. Kelly, Bryan, Dimitris Papanikolaou, Amit Seru, and Matt Taddy. forthcoming. “Measuring Technological Innovation Over the Long Run.” Kräkel, Matthias. 2006. “Zur Reform Der Professorenbesoldung in Deutschland.” Perspektiven Der Wirtschaft- spolitik, 7(1): 105–126. Lach, Saul, and Mark Schankerman. 2004. “Royalty Sharing and Technology Licensing in Universities.” Journal of the European Economic Association, 2(2-3): 252–264. Lach, Saul, and Mark Schankerman. 2008. “Incentives and Invention in Universities.” The Rand Journal of Economics, 39(2): 403–433. Lavy, Victor. 2009. “Performance Pay and Teachers’ Effort, Productivity, and Grading Ethics.” The American Economic Review, 1979–2011. Lazear, Edward P. 2000. “Performance Pay and Productivity.” The American Economic Review, 90(5): 1346–1361. Lazear, Edward P, and Paul Oyer. 2012. “.” The Handbook of Organizational Economics, 479. Leitungsgremium, Universitaet Augsburg. 2005. “Grundsaetze Der Universitaet Augsburg Fuer Die Vergabe Von Leistungsbezuegen.” Lerner, Josh, and Julie Wulf. 2007. “Innovation and Incentives: Evidence from Corporate R&d.” The Review of Economics and Statistics, 89(4): 634–644. Leuven, Edwin, Hessel Oosterbeek, Joep Sonnemans, and Bas Van Der Klaauw. 2011. “Incentives Versus Sorting in Tournaments: Evidence from a Field Experiment.” Journal of Labor Economics, 29(3): 637–658. Lucas, R. E. 1988. “On the Mechanisms of Economic Development.” Journal of Monetary Economics, 22(1): 3–32. Lünstroth, Pia. 2011. “Leistungslohn Und Kooptation-eine ökonomische Analyse Der Reform Der Profes- sorenbesoldung.” PhD diss. Universitaet Trier. Lutter, Mark, and Martin Schröder. 2016. “Who Becomes a Tenured Professor, and Why? Panel Data Evidence

34 from German Sociology, 1980–2013.” Research Policy, 45(5): 999–1013. Macleod, Bentley. n.d.. Beyond Price Theory. MIT Press. Manso, Gustavo. 2011. “Motivating Innovation.” The Journal of Finance, 66(5): 1823–1860. McCormack, John, Carol Propper, and Sarah Smith. 2014. “Herding Cats? Management and University Performance.” The Economic Journal, 124(578): F534–F564. Miklós-Thal, Jeanine, and Hannes Ullrich. 2016. “Career Prospects and Effort Incentives: Evidence from Professional Soccer.” Management Science, 62(6): 1645–1667. Mohr, Joachim. 2007. “Professoren, Die Vertreibung Der Weisen.” Spiegel Online. Muralidharan, Karthik, and Venkatesh Sundararaman. 2011. “Teacher Performance Pay: Experimental Evi- dence from India.” The Journal of Political Economy, 119(1): 39–77. Oeffentlicher Dienst. 2004. “Gesetz Ueber Die Erhoehung Von Dienst- Und Versorgungsbezuegen in Bund Und Laendern 2003/2004.” Last accessed 11-08.2015. Oyer, Paul, and Scott Schaefer. 2011. “Personnel Economics: Hiring and Incentives.” Handbook of Labor Economics, 4: 1769–1823. Preissler, U. 2006. “Zwischenbilanz Professorenbesoldung - Sichtweise Und Beratung Des Deutschen Hochschul- verbandes.” Arbeitsgruppe Fortbildung im Sprecherkreis der Deutschen Universitätskanzler. Pritchard, Rosalind. 2006. “Trends in the Restructuring of German Universities.” Comparative Education Review, 50(1): 90–112. Romer, Paul M. 1986. “Increasing Returns and Long-run Growth.” The Journal of Political Economy, 94(5): 1002– 1037. Rubin, Jared, Anya Samek, and Roman M Sheremeta. 2018. “Loss aversion and the quantity–quality tradeoff.” Experimental Economics, 21(2): 292–315. Schniederjuergen, Axel. 2013a. E-mail. Schniederjuergen, Axel. 2013b. E-mail. Shearer, Bruce. 2004. “Piece Rates, Fixed Wages and Incentives: Evidence from a Field Experiment.” The Review of Economic Studies, 71(2): 513–534. Silva, JMC Santos, and Silvana Tenreyro. 2006. “The Log of Gravity.” The Review of Economics and Statistics, 88(4): 641–658. Universitaet Regensburg, Universitaetsleitung. 2016. “Grundsaetze Der Universitaet Regensburg Zur Vergabe Von Leistungsbezuege.” Von Proff, Sidonia, Guido Buenstorf, and Martin Hummel. 2012. “University Patenting in Germany before and After 2002: What Role Did the Professors’ Privilege Play?” Industry and Innovation, 19(1): 23–44. Wooldridge, Jeffrey M. 2010. Econometric Analysis of Cross Section and Panel Data. MIT press. Wuchty, Stefan, Benjamin F Jones, and Brian Uzzi. 2007. “The Increasing Dominance of Teams in Production of Knowledge.” Science, 316(5827): 1036–1039. Wu, Yanhui, and Feng Zhu. 2018. “Competition, Contracts, and Creativity: Evidence from Novel Writing in a Platform Market.”

35 (a) Citations (b) Impact Factor Rated Publications

Figure 1: Number of Publications in Citation and IFR Publication Bins Notes: Histograms depict the coefficient estimates and corresponding 95% confidence intervals of the post002 ∗ Treatment (grey bars) and Tenure ∗ Treatment (white bars) interactions in separate regressions with citation quantile frequency variables as dependent variable in sub- figure a, and impact factor rating quantile frequency variables as dependent variables in sub-figure b. To generate these dependent variables, percentiles of citations and impact factor ratings are calculated separately by field and publication year and the percentile cut-offs are used to generate quantile frequency variables for each author and publication year. All other specifications as in the baseline estimation of the effort effect.

36 (a) Number of Publications (b) Impact Factor-Rated Number of Publications

(c) Total Number of Citations (d) Average Impact Factor Rating

(e) Average Number of Citations

Figure 2: Pre-trends and Effect Dynamics - Treatment vs. Control Cohort Notes: Plots depict the coefficient estimates and corresponding 95% confidence intervals of the interactions of a treatment dummy and time- to-tenure fixed effects in regressions with the following dependent variables: number of publications (panel a), impact factor-rated number of publications (panel b), total number of citations (panel c), average impact factor rating (panel d), and average number of citations. All other specifications as in the baseline estimation of the effort effect.

37 (a) Number of Publications (b) Impact Factor-Rated Number of Publications

(c) Total Number of Citations (d) Average Impact Factor Rating

(e) Average Number of Citations

Figure 3: Pre-trends and Effect Dynamics - Placebo Experiment Notes: Figures depict the coefficient estimates and corresponding 95% confidence intervals of the interactions of a treatment dummy and time- to-tenure fixed effects in regressions with the following dependent variables: number of publications (panel a), impact factor-rated number of publications (panel b), total number of citations (panel c), average impact factor rating (panel d), and average number of citations.The sample is restricted to academics who started their first tenured affiliation at a German public university in 2001, 2002, 2003 or 2004 and did not hold a foreign affiliation immediately prior. Treatment is 1 if an academic makes tenure at a public university in 2003 or 2004 and 0 if they becomes tenured in 2001 or 2002 (the control group). All other specifications as in the baseline estimation of the effort effect.

38 (a) Natural and Applied Sciences (b) Social Sciences and Humanities

Figure 4: Heterogeneous Effort Effect by Broad Academic Field Notes: Histograms depict the coefficient estimates and corresponding 95% confidence intervals of the post002 ∗ Treatment (grey bars) and Tenure ∗ Treatment (white bars) interactions in separate regressions with the following dependent variables: the number of publications, the impact factor-rated number of publications, the total number of citations to all publications published in a given year, the average impact factor rating of publications published in a given year, and the average number of citations to publications published in a given year. In sub-figure a, the sample is restricted to academics in the natural and applied sciences.. In sub-figure b, the sample is restricted to academics in the social sciences and humanities. Mathematics, physics and informatics, biology, chemistry, earth sciences, pharmacology, engineering, medicine, dentistry, veterinary, agricultural science and nutrition science are classified as natural and applied sciences, while social sciences and humanities, comprises theology, philosophy and history, philology and anthropology, law, economics and other social sciences. All other specifications as in the baseline estimation of the effort effect.

39 (a) 10th (top) Decile (b) 9th Decile

(c) 8thDecile (d) 7th Decile

(e) 6thDecile (f) Below Median

Figure 5: Heterogeneous Effort Effect by Productivity Quantile Notes: Panels are restricted to, respectively, the top five deciles (sub-figures a-e) and below median academics (sub-figure f). Productivity deciles are determined on the basis of the averages of the impact factor-rated number of publications over the three pre-announcement years 1999, 2000 and 2001, separately by academic field and treatment group. All other specifications as in the baseline estimation of the effort effect.

40 (a) 10th (top) Decile (b) 9th Decile

(c) 8thDecile (d) 7th Decile

(e) 6thDecile (f) Below Median

Figure 6: Heterogeneous Effort Effect - Citation Bins Notes: Results from separate regressions with citation quantile frequency variables as dependent variable.Panels are restricted to, respectively, the top five productivity deciles (sub-figures a-e) and below median productivity academics (sub-figure f). Productivity deciles are determined on the basis of the averages of the impact factor-rated number of publications over the three pre-announcement years 1999, 2000 and 2001, separately by academic field and treatment group. All other specifications as before.

41 (a) Backward Cosine Similarity (b) Forward Cosine Similarity

Figure 7: Cosine Similarity Distributions Notes: Results from separate regressions with backward cosine similarity quantile frequency variables as dependent variable in sub-figure a, and forward cosine similarity quantile frequency variables as dependent variables in sub-figure b. To generate these dependent variables, percentiles of backward and forward cosine similarity metrics are calculated separately by field and publication year and the percentile cut-offs are used to generate quantile frequency variables for each author and publication year. Further details in text. All other specifications as in the baseline estimation of the effort effect.

42 Figure 8: Quartiles Based on Impact Factor-Rated Number of Publications Notes: Epanechnikov kernel-density estimates of the hazard function for switching to the performance pay scheme for academics in the top quartile (red line) and bottom three quartiles (blue line) of the average productivity distribution. Switches to performance pay are defined as any contract, position or affiliation change after 2005. Quartiles are determined based on three pre-implementation year averages (2002, 2003 and 2004) of the number of impact factor-rated publications. Quartiles are derived separately by academic field and tenure year. The sample is restricted to academics who held a tenured affiliation at a public university before 2005.

43 Table 1: Summary Statistics

Research productivity variables N Mean SD Median Min Max

Number of publications 108363 2.907 7.134 0 0 195

IF-rated publications 108363 9.874 31.38 0 0 1082.363

Citations 108363 102.271 333.992 0 0 11086

Average IF-rating 48552 2.709 2.848 2.075 0 52.589

Average citations 48552 32.691 61.322 19.118 0 2759

Maximum citations 48552 99.164 216.016 42 0 8796

Minimum citations 48552 10.612 41.576 1 0 2759

Notes: The unit of observation is academic i. The sample is restricted to academics who started their first tenured affiliation at a German public university in 2002 to 2007 (excluding those with a foreign affiliation directly prior to this and dropping academics once they pass away or retire) and includes data from 1993 until and including 2012.

44 Table 2: Baseline Effort Effect Estimation

# Publications IF-rated publications Citations Average IF-rating Average citations

Post’02 * Treatment 0.168*** 0.133*** 0.129** -0.094*** -0.103*

(0.038) (0.051) (0.063) (0.035) (0.062)

Tenure * Treatment -0.011 0.026 0.017 0.025 -0.018

(0.040) (0.052) (0.064) (0.036) (0.067)

Number of Observations 83937 74326 78308 47052 47789

Number of Individuals 4671 4136 4357 3917 4110

Log Likelihood -136647.270 -338248.006 -4508200.290 -70677.097 -736179.301

Chi-squared 2863.231 2856.117 1101.473 611.818 476.042

Notes: The unit of observation is academic i. The sample is restricted to academics who started their first tenured affiliation at a German public university in 2002 to 2007 (excluding those with a foreign affiliation directly prior to this and dropping academics once they pass away or retire) and is estimated using data from the the period 1993-2012. The dependent variables are, respectively, the number of publications, the impact factor-rated number of publications, the total number of citations to all publications published in a given year, the average impact factor rating of publications published in a given year, and the average number of citations to publications published in a given year. All dependent variables are defined for academic i in field f and year t, lagged by average publication lag in field f as reported in Björk and Solomon (2013). Post002 is 0 before 2002 and 1 thereafter, Tenured is 0 before an academic obtains their first tenured affiliation and 1 thereafter, and Treatment is 1 if an academic makes tenure at a public university in 2005, 2006 or 2007 and 0 if they make tenure at a public university in 2002, 2003 or 2004 (the control group). All specifications control for year and individual fixed effects and fifteen time-to-tenure fixed effects (from seven years before the tenure year to seven years after). Estimation as conditional quasi-maximum likelihood estimation of Poisson fixed effects models with robust standard errors clustered at the individual level.

45 Table 3: Heterogeneous Responses - Interactions

# Publications IF-rated publications Citations Average IF-rating Average citations

Post’02 * Treatment 0.280*** 0.296** 0.249* -0.116 -0.107

(0.071) (0.116) (0.129) (0.077) (0.110)

Post’02 * Treatment * Top Decile -0.152* -0.199 -0.063 0.038 0.161

(0.087) (0.132) (0.152) (0.091) (0.140)

Post’02 * Treatment * 9th Decile -0.331*** -0.443*** -0.391** -0.016 0.014

(0.095) (0.140) (0.163) (0.096) (0.147)

Post’02 * Treatment * 8th Decile -0.152 -0.290* -0.497*** -0.008 -0.270

(0.101) (0.152) (0.182) (0.094) (0.181)

Post’02 * Treatment * 7th Decile -0.022 -0.019 -0.053 0.004 -0.222

(0.107) (0.155) (0.176) (0.125) (0.171)

Post’02 * Treatment * 6th Decile -0.094 0.134 0.252 0.263** 0.470***

(0.124) (0.217) (0.223) (0.110) (0.177)

Tenure * Treatment -0.032 0.093 0.046 0.090 -0.042

(0.079) (0.115) (0.134) (0.071) (0.109)

Tenure * Treatment * Top Decile -0.035 -0.150 -0.199 -0.098 -0.143

(0.080) (0.114) (0.139) (0.083) (0.131)

Tenure * Treatment * 9th Decile 0.118 0.030 0.152 -0.124 0.064

(0.096) (0.134) (0.172) (0.083) (0.136)

Tenure * Treatment * 8th Decile 0.071 0.020 0.170 -0.059 0.102

(0.092) (0.130) (0.165) (0.082) (0.152)

Tenure * Treatment * 7th Decile 0.021 -0.107 -0.086 0.014 0.226

(0.101) (0.139) (0.191) (0.114) (0.166)

Tenure * Treatment * 6th Decile 0.110 -0.049 -0.004 -0.217** -0.194

(0.111) (0.172) (0.199) (0.096) (0.165)

Number of Observations 80953.000 71540.000 75504.000 45588.000 46324.000

Number of Individuals 4505.000 3981.000 4201.000 3771.000 3966.000

Log Likelihood -131129.798 -321279.569 -4287805.762 -68430.475 -708085.051

Chi-squared 3329.827 3418.293 1569.162 687.096 509.727

Notes: Results are from the estimation of the baseline model augmented with a dummy variables for productivity deciles and their double and triple interactions with Post002, Tenure and Treatment. Productivity deciles are determined on the basis of the averages of the impact factor-rated number of publications over the three pre-announcement years 1999, 2000 and 2001, separately by academic field and treatment group. All other specifications as before.

46 Table 4: Selection Analysis

Panel A: Treatment versus Control Panel B: Placebo

1a 1b 2a 2b 3a 3b

Treatment 0.658 0.046 0.576 -0.207 -0.790 -1.653**

(0.401) (0.426) (0.412) (0.436) (0.624) (0.667)

Age -0.126*** -0.299*** -0.131*** -0.313*** -0.165*** -0.449***

(0.007) (0.017) (0.007) (0.017) (0.010) (0.040)

Avg Productivity -0.001 -0.004* -0.038*** -0.056*** -0.003 -0.006

(0.002) (0.002) (0.015) (0.013) (0.014) (0.015)

Age * Treatment -0.028*** -0.007 -0.026*** -0.001 0.013 0.034**

(0.009) (0.009) (0.009) (0.010) (0.013) (0.014)

Avg Productivity * Treatment 0.004** 0.007*** 0.032** 0.051*** -0.008 -0.005

(0.002) (0.002) (0.015) (0.014) (0.015) (0.016)

Avg Productivity * Age 0.001*** 0.001*** 0.000 0.000

(0.000) (0.000) (0.000) (0.000)

Avg Productivity * Age * Treatment -0.001* -0.001*** 0.000 0.000

(0.000) (0.000) (0.000) (0.000)

Age at Tenure 0.192*** 0.200*** 0.293***

(0.015) (0.016) (0.038)

Constant 2.381*** 1.599*** 2.576*** 1.924*** 3.667*** 2.721***

(0.329) (0.347) (0.334) (0.350) (0.449) (0.479)

Number of Observations 80131 80131 80131 80131 51431 51431

Number of Subjects 14972 14972 14972 14972 6960 6960

Number of Switches 2435 2435 2435 2435 1099 1099

Log Likelihood -7545.484 -7404.981 -7541.479 -7394.569 -3232.134 -3178.716

Chi-squared 1365.450 1310.708 1378.937 1326.956 595.064 557.869

Rho 1.345 1.653 1.345 1.671 1.439 2.516

Notes: The unit of observation is academic i. The table reports estimation results of Weibull proportional hazard models of selection into performance pay. Any first affiliation, position or contract change after implementation of the pay reform (as of 2005) is considered a switch to performance pay (the “failure” event) and the time until such a switch is used as duration variable. Academics are “at risk” of switching after their most recent affiliation/position/contract change if they are tenured, at a public university and not retired. In Panel A, the treatment variable is 1 for academics who have made tenure before 2005 and 0 for those who make tenure afterwards. Panel B reports the results for a placebo experiment where the placebo-treatment group comprises academics who start their first tenured position before 2002, while academics who start their first tenured affiliation between 2002 and 2005 act as placebo-control group. “Avg Productivity” is calculated as three year pre- implementation averages (2002-2004) of the impact factor-rated number of publications. The age variable is equal to an author’s self-reported age if known, and equal to a synthetic age otherwise. The synthetic age is calculated using the average age at habilitation, promotion or tenure. All specifications include academic field fixed effects and are estimated for years t>2004 and for academics who start their first tenured affiliation as of 1999 (excluding those with a foreign affiliation directly prior to this). The models in the “b” columns models also control for synthetic age at tenure. Standard errors are robust and clustered by individual academic.

47 Appendix

A1 Model of Performance Pay and Multi-Tasking

This Appendix provides a detailed exposition of the model outlined in the main text. The set-up of the model is provided in the main text and not repeated here. I first discuss the full insurance case as a benchmark before presenting the performance pay case. I then derive implications for the effort and selection effects of performance pay, by comparing the equilibrium behavior under the latter to that under a flat wage system. Finally, I show that equivalent effort and selection results hold for an incentive system that features career concerns only, and that the results for selection into performance pay carry over to selection into academia and other labor markets featuring such performance pay.

A1.1 Flat Wage

Suppose that principals can offer flat wage contracts only and, moreover, that they cannot tailor wages to their beliefs about agent ability or effort. This is the case in the German age-related pay system, in which principals (universities) do not have discretion over wage contracts, and would apply to other markets in which wage contracts are similarly constrained. With a per period discount factor δ, the expected life-time utility of an agent is given by:

( " ∞ #) t −→ U = E −exp −r ∑ δ (wt − c( e t)) t=0

In this benchmark flat wage case, pay wt ≥ 0 does not depend on output or effort in any period, so an agent’s payoff in period t in certainty equivalents is then:

−→ −→ −→ ut (wt, e t) = E {wt − c( e t)} = wt − c( e t) (7)

It follows that optimal effort in any period t equals minimum effort levels:

−→ −→ −→ argmaxe [ut (wt, e t)] = argmaxe {wt − c( e t)} = e (8)

Any differences in effort therefore reflect differences in intrinsic motivation, minimum output requirements such as tenure requirements and other such mechanisms. Suppose agent’s outside option yields per period utility u. Principals then need to set the wage −→∗ ∗ wt such that E {wt − c( e t )} = wt ≥ u in order to attract agents. In equilibrium therefore wt = u

A1.2 Career Concerns and Incentive Contracts

Consider now the case of a perfectly competitive labor market in which only short-term contracts are feasible. As in Gibbons and Murphy (1992), I restrict attention to linear contracts of the form

48 −→ −→T −→ wt ( y t) = ct + bt y t. This not only ensures tractability, but Holmstrom and Milgrom (1991) also show that optimal contracts are linear in a setting with comparable assumptions about agent preferences and output noise. In each period, the timing of actions is as follows: principals offer a contract (wt) to agents; agents pick the contract that yields the highest expected utility; agents choose and exert effort; output materializes, principals receive the output produced by agents they employ and agents are paid according to their contract terms. Because output is observable to all market participants, the market can use this information −→ to update its beliefs about agent ability. Given the assumptions on ability and output noise, yt is bivariate normal. The prior distribution of θ is normal as well, and hence so is the posterior distribution of θ. Using well-known formulas in De Groot (1970), the mean and variance of −→ −→ this posterior distribution of θ, given past output ( y ,.., y t− ) and conjectured effort levels −→ −→ 0 1 ( eˆ 0,.., eˆ t−1), are given by, respectively,

2 2 2 t−1  2 2 h −→ −→ i σ σ m0 + σ ∑ σ (yp,s − eˆp,s) + σ (yq,s − eˆq,s) m := E |(−→y ,..,−→y );( eˆ ,.., eˆ ) = ε ν 0 s=0 ν ε . t θ 1 t−1 0 t−1 2 2 2 2 2 σε σν +tσ0 (σε + σν ) (9) and 2 2 2 2 σ0 σε σν σt := var (θt) = 2 2 2 2 2 (10) σε σν +tσ0 σε + σν Perfect competition in the labor market implies that agents are offered contracts that will earn them their expected productivity, that is:

 T −→   −→ −→T −→ E 1 y t = (mt + eˆp,t) + mt + eˆq,t = E [wt ( y t)] = ct + bt y t (11)

Here 1T denotes a 1x2 matrix of ones.36 An agent’s expected lifetime utility is then given by

( " ∞ #) ( " ∞ #) t  −→T −→ −→  t −→ U = E −exp −r ∑ δ ct + bt y t − c( e t ) = E −exp −r ∑ δ ((mt + ep,t ) + (mt + eq,t ) − c( e t )) t=0 t=0 (12) −→ In any period t, effort affects the payoff that period, through the explicit bonus bt , as well as future wage payments through updated beliefs about ability. If only the latter, career concerns incentives are present, we know, following Holmström (1999) and using 9 and 12, that optimal effort is given by the following first order conditions −→ −→ ∂c( e t) 2 ∂c( e t) 2 = σνCCt; = σε CCt (13) ∂ep,t ∂eq,t

2 ∞ τ σ0 where CCt = 2∑τ=0 δ 2 2 2 2 2 . As noted in Macleod (n.d.), adding explicit perfor- σε σν +(t+τ)σ0 (σε +σν ) mance incentives to career concerns in this model does not affect the career concerns incentives. Ability enters output additively and does not affect the marginal cost of effort, so the optimal −→ bonus bt does not depend on mt, as will be shown below. Future income risk is therefore

36This vector of ones implies that principals’ return to the output quantity and quality (signal) is equal to this output (signal). It is straightfor- ward to allow for other rates of return and all the model’s results would continue to hold.

49 unaffected by effort, and career concerns incentives are unaffected by the introduction of explicit performance incentives. The first order conditions for optimal effort are thus −→ ∂c( e t)  2 = (ep,t − e¯p,t) + d eq,t − e¯q,t = bp,t + σνCCt (14) ∂ep,t −→ ∂c( e t)  2 = eq,t − e¯q,t + d (ep,t − e¯p,t) = bq,t + σε CCt (15) ∂eq,t

Substituting 14 in 15 and rearranging yields the following expressions for optimal effort

1 e∗ =e¯ + b + σ 2CC − d b + σ 2CC  (16) p,t p 1 − d2 p,t ν t q,t ε t 1 e∗ =e¯ + b + σ 2CC − d b + σ 2CC  (17) q,t p 1 − d2 q,t ε t p,t ν t

−→∗ −→ To derive the optimal bonus bt , I again use that bt affects only the effort and income risk in period t and is therefore chosen to maximize agent utility that period. This amounts to optimizing −→ the certainty equivalent with respect to bt −→ n −→  −→  −→ r −→ o b∗ = argmax m + e b + m + e b − c −→e b − b 2Σ 2 t bt t p,t t t q,t t t t 2 t t " # σ 2 + σ 2 σ 2 where 2 = t ε t . The resulting first order conditions are Σt 2 2 2 σt σt + σν

−→ −→  −→  ∂ep,t bt  −→  ∂eq,t bt ∂c( e t) ∂c( e t) 2 2 2 1 − + 1 − = r bp,t σt + σε + bq,tσt (18) ∂ep,t ∂bp,t ∂eq,t ∂bp,t −→ −→  −→  ∂eq,t bt  −→  ∂ep,t bt ∂c( e t) ∂c( e t) 2 2 2 1 − + 1 − = r bq,t σt + σν + bptσt (19) ∂eq,t ∂bq,t ∂ep,t ∂bq,t

Using 14 and 15 to substitute for the terms on the left-hand side both directly and using the implicit function theorem, we get

2 ∗ 2 ∗ 1 − σνCCt − rbq,tσt bp,t = 2 2 1 + r σt + σε 2 ∗ 2 ∗ 1 − σε CCt − rbp,tσt bq,t = 2 2 1 + r σt + σν

50 Substituting one into the other yields

2 2 2  2 2  ∗ 1 + r σt + σν 1 − σνCCt − rσt 1 − σε CCt bp,t = (20) 2 2 2 2 22 1 + r σt + σε 1 + r σt + σν − rσt 2 2 2  2 2  ∗ 1 + r σt + σε 1 − σε CCt − rσt 1 − σνCCt bq,t = (21) 2 2 2 2 22 1 + r σt + σε 1 + r σt + σν − rσt −→ It follows from the assumptions on c( e t) and the fact that the right-hand sides depend on the model’s parameters only that 20 and 21 define the unique optimal bonuses in each period. Finally, substituting 20 and 21 into 16 and 17 and rearranging, we get the following expressions for optimal effort:

(1 − d) + r σ 2 − dσ 2 e∗ =e¯ + ν ε 1 + rCC σ 2σ 2 (22) p,t p,t (1 − d2)D t ε ν " # (1 − d) + r σ 2 − dσ 2 e∗ = max 0,e¯ + ε ν 1 + rCC σ 2σ 2 (23) q,t q,t (1 − d2)D t ε ν

2 2 2 2 2 2 2 2 2 where D := 1 + r 2σt + σε + σν + r σt σε + σν + σε σν and where I have used that −→  −→ et = ep,t,eq,t ≥ 0 on the last line. By the same reasoning as used for optimal bonuses above, it follows that 22 and 23 define the unique optimal effort levels. What remains to be shown, −→ −→∗ is that the the market’s conjectured effort levels are correct in equilibrium: eˆ t = e t . It is −→∗ −→ immediate that this condition is met, since e t does not depend on eˆ t. Proposition 1 provides a formal characterization of the equilibrium derived above: Proposition 1 - Equilibrium Performance Pay Contracts and Effort: In a perfectly com- petitive labor market in which only short-term contracts are feasible, where agents face both career concerns and linear performance pay contracts, and the cost of effort is given by 1 with ∗ ∗ 0

A1.3 Implications

Comparing the performance pay equilibrium characterized in Proposition 1 to the flat wage equilibrium in subsection A1.1 yields a number of testable implications for output quantity and quality, and selection into performance pay. First, I compare effort under the two pay schemes. We may assume, without loss of generality, that one of the output dimensions is more risky or less precisely measured. In academia, as in 2 2 many other professional jobs, output quality is noisier than quantity, and hence σν > σε . Because ∗ 2 2 0 < d < 1 and agents are risk averse, it follows that ep,t > e¯p,t if (1 − d) + r σν − dσε > 0, ∗ 2 2 while ep,t > e¯p,t if (1 − d) + r σε − dσν > 0. The former always holds, while the latter is true

51 for 2 1 + rσε d < 2 ≤ 1 (24) 1 + rσν That is, effort for output quantity is unambiguously larger under performance pay, whereas effort towards quality is larger under performance pay only if quantity and quality are sufficiently weak 2 ∂d 1+rσε substitutes. With < 0, (24) is fulfilled for m0 > mˇ where mˇ is such that d (mˇ ) < 2 . If ∂m0 1+rσν the quality measure is very noisy, there may not be any agent ability class that increases quality 2 1+rσε effort; specifically, if d (mˇ ) < 2 < d (m). Conversely, with very little quality noise and a 1+rσν 2 1+rσε low degree of substitution across ability classes, if d (m0) < d (mˇ ) < 2 , all agents increase 1+rσν quality effort. This leads to the second implication, for heterogeneous effort responses. The derivatives of quantity and quality effort with respect to ability class mo are

∗ 2 2 2 ∂ep,t 1 − d 1 + rσε − 2d 1 + rσν  2 2 ∂d = − 2 1 + rCCtσε σν (25) ∂m0 (1 − d2) D ∂m0

∗ 2 2 2 ∂eq,t 1 − d 1 + rσν − 2d 1 + rσε  2 2 ∂d = − 2 1 + rCCtσε σν (26) ∂m0 (1 − d2) D ∂m0

2 2 1+rσε 2d 1+rσν 2d The first derivative is positive if 2 > 2 while the second is positive if 2 > 2 . 1+rσν (1+d ) 1+rσε (1+d ) 2 2 Given that σν ≥ σε and 0 < d < 1, the latter condition is met for all ability classes. Combined with (24), this implies that the highest ability agents increase quality effort the most or decrease it the least. Furthermore, using (24) and (23), it follows that low ability agents with m0 ∈ [m,mˇ ] may end up at a corner solution where, since effort cannot be negative, agents exert no quality effort under performance pay. Depending on e¯q (m0) of such agents, the introduction of performance pay either decreases quality effort and hence quality (if e¯q (m0) > 0) or does not change quality effort (ife ¯q (m0) = 0). 2   1+rσε 2d ∂ 2d To evaluate the sign of (25), define m0 = m such that 2 = 2 . Because 2 < b 1+rσν (1+d ) ∂m0 (1+d ) ∗ ∂ep,t ∗ 0, it follows that > 0 for m0 > m. Combined with the earlier finding that e > e¯p,t,∀m0, ∂m0 b p,t this implies that agents for whom m0 = mb increase quantity effort the least when performance pay is introduced, while agents of both lower and higher ability classes increase quantity effort 2 more. Furthermore, since mb increases with σν , the least responsive ability class is closer to the highest ability class, the noisier is the quality signal. Moreover, because d < 2d , the (1+d2) condition for a positive quality effort response (24) is less restrictive than the condition for a positive derivative of quantity effort and hence mˇ < mb. That is, the lowest ability class that has a positive quality effort response - if such an ability class exists - is of lower ability than the ability class that increases quantity effort the least. If the quality measure is very noisy, such that no agent ability class increases quality effort, quantity effort decreases monotonically with prior ability levels.

52 Thirdly, the model allows me to derive hypotheses about selection. Agents prefer performance pp pay in t over a flat wage if the certainty equivalent utility in the former system (CEt ) exceeds f w that in the latter (CEt ) as of period t, that is, if:

∞ ∞ pp n ∗  ∗  −→ −→∗  r −→∗2 2o f w CEt (m0) = ∑ mt + ep,t + mt + eq,t − c e t et − bt Σt > ∑ {wt} = CEt t=0 2 t=0 (27)

The derivative of the left-hand side with respect to ability class m0 is

∞ ( 2 2 ∗ ∗ ! ) 2σε σν ∂ep,t ∂eq,t ∗  ∗  ∂d ∑ 2 2 2 2 2 + + − 2 ep,t − e¯p,t eq,t − e¯q,t (28) t=0 σε σν +tσ0 σε + σν ∂m0 ∂m0 ∂m0

−→∗ where I have used that bt does not depend on m0. It follows from the foregoing discussion that a 2 1+rσε 2d sufficient condition for this derivative to be positive is for 2 > 2 and thus for m0 > m. 1+rσν (1+d ) b 2 1+rσε 2d If (28) is ever negative, this occurs for 2 < 2 and hence for m0 < m (necessary, not 1+rσν (1+d ) b sufficient). If some, but not all agents prefer performance pay, it must be that at least one m0 = met exists such that (27) holds with equality for d = d (met). If met > mb, all agents with m0 > met pp switch to performance pay in t because (28) is positive. As a check, note that CEt → c for pp ∗ ∗ d → 0, with c finite. To see this, substitute 1 into CEt and use that ep,t, ep,t, their squares and pp f w f w cross-product all converge to finite constants. If CEt (met) = CEt , it follows that c > CEt for m0 > met because 28 is necessarily positive for m0 > mb. If met < mb, then (28) is either positive or negative. If positive, we again have that agents with m0 > met switch to performance pay. If (28) is negative, agents with m0 < met would switch to pp ∗ ∗ 2 performance pay. But CEt → ∞ for d → 1. To see this, note that ep,t → ∞ while − eq,t → −∞ ∗ 0 0 at a faster rate, and ep,t → max[0,c ] where c is finite and given by the second term in brackets in (23). This is a contradiction, because if (27) holds with equality for d = d (met) and (28) pp is negative at d = d (met), then for d → 1 and hence met < m0 → m, CEt (m0) should exceed pp f w CEt (met) = CEt ≥ 0. But then, no m0 = met exists such that (27) holds with equality and (28) is negative. Hence, if there are agents who prefer performance pay, these are agents with 0 0 m0 > met. Furthermore, it follows immediately from the foregoing that if wt > wt, then met < met. Thus, only agents from higher ability classes select into performance pay, and agents that face higher flat wages need to be of even higher ability class to switch to performance pay. Proposition 2 summarizes the three testable implications derived above: Proposition 2 - Effort and Selection Effects of Performance Pay: Comparing the perfor- mance pay equilibrium characterized in Proposition 1 to the flat wage equilibrium derived in section A1.1, the following is true: 1) quantity effort is unambiguously higher in the performance 2 1+rσε pay system, while quality effort is higher only if d < 2 ; 2) agents with the highest prior 1+rσν ability increase quality effort the most or decrease it the least, agents with prior ability m0 = m 2 b 1+rσε 2d(mb) such that 2 = 2 increase quantity effort the least, and if no agent ability class 1+rσν (1+d (mb)) increases quality then the quantity effort increase decreases monotonically with prior ability;

53 pp f w 3) high ability agents, for whom m0 ≥ met, where met is such that CEt (met) = CEt , select into performance pay.

A1.4 Robustness

In this section I show that the selection results derived in the previous section carry over to selection into academia and other labor markets featuring similar incentives more broadly, and ∗ that all results go through with career concerns only. The former is trivially true, since wt = u f w ∞ ∞ o o and hence CEt = ∑t=0 {wt} = ∑t=0 {u} = CEt , where CEt denotes the certainty equivalent utility of an agent’s outside option. Thus, all results regarding selection into performance pay relative to flat wages hold for selection into an academic system with performance pay relative to an outside option. That is, I expect only sufficiently high ability agents to select into academia, just as the model predicts that only academics of sufficiently high ability switch to performance pay upon its introduction into academia. To see that the effort and selection effect results derived in the preceding section continue to hold with career concerns only (i.e. absent short-term performance contracts), recall that the first −→ −→ ∂c( e t ) 2 ∂c( e t ) 2 order condition for optimal effort is = σ CCt; = σ CCt in this case. Substituting ∂ep,t ν ∂eq,t ε −→ −→ for ∂c( e t ) and ∂c( e t ) as is done in 14 and 15, then substituting the first order conditions one into ∂ep,t ∂eq,t the other and rearranging yields the following expressions for equilibrium effort:

CC e∗ =e¯ + t σ 2 − dσ 2 (29) p,t p 1 − d2 ν ε CC e∗ =e¯ + t σ 2 − dσ 2 (30) q,t p 1 − d2 ε ν

2 2 Because σν > σε , effort exerted towards quantity is unambiguously higher compared to a flat 2 2 wage system, while quality effort is higher only if d < σε /σν , and hence for agents in sufficiently high ability classes. This result is directly equivalent to the first implication in Proposition 2.

The derivatives of quantity and quality effort with respect to ability class mo are

∗ 2 2 2 ∂ep,t 1 − d σε − 2dσν ∂d = − 2 CCt (31) ∂m0 (1 − d2) ∂m0

∗ 2 2 2 ∂eq,t 1 − d σν − 2dσε ∂d = − 2 CCt (32) ∂m0 (1 − d2) ∂m0

2 2 σε 2d σν 2d The first of these is positive if 2 > 2 while the second is positive if 2 > 2 . The σν (1+d ) σε (1+d ) latter condition is met for all ability classes, so quality effort increases most or decreases least for the highest ability agents. The first derivative is positive only for sufficiently high ability classes. 2 σε 2d Quantity effort increases the least for agents for whom m0 is such that 2 = 2 . This least σν (1+d ) responsive ability class is closer to the highest ability class, the noiser is the quality signal. These results are the direct equivalent of the second clause of Proposition 2.

54 Finally, agents prefer pay based on career concerns to a flat wage if

∞ ∞ cc n ∗  ∗  −→ −→∗ o f w CEt (m0) = ∑ mt + ep,t + mt + eq,t − c e t et > ∑ {wt} = CEt (33) t=0 t=0

−→∗ Because bt does not depend on m0, the derivative of the certainty equivalent under career cc pp concerns, CEt (m0), with respect to ability class m0 is the same as the derivative of CEt (m0) with respect to m0, i.e. (28). From this it immediately follows that the results for selection into pay based on career concerns are the exact same as the results for selection into the performance pay system above (third clause of Proposition 2).

A2 Additional Institutional Details

A2.1 Performance Pay (W-Pay) and Tenure

Junior Professors, the German equivalent of assistant professors, can earn a (non-pensionable) supplement of 260 euro per month upon positive evaluation in the W-pay scheme and an addi- tional supplement, not to exceed 10% of a junior professor’s basic wage, in special circumstances (Detmer and Preissler, 2005). For comparison, tenured professors can earn performance bonuses up to a total amount of 5241,48 euro per month37 - more than the highest basic wage in the performance pay scheme - or more in special circumstances, such as when the academic already earns bonuses that exceed this limit and a higher bonus is necessary to attract the academic to another German university or prevent them from wandering off to another German university, or to attract a scholar from outside German academia or prevent them from leaving German academia (BMBF, 2002; Detmer and Preissler, 2005). Performance incentives are thus much more high-powered for tenured professors, which is why I restrict attention to those in the empirical analyses. There are two tenured professorial ranks in Germany; the equivalent of an associate pro- fessorship (“ausserordentliche (or a.o.) Professur”), and the equivalent of a full professorship (“ordentliche (o.) Professur”). In order to qualify for a tenured affiliation, and performance pay bonuses such as an attraction or retention bonus, academics need to have completed a PhD as well as, traditionally, a post-doctoral qualification (“habilitation”). The habilitation involves working as part of the research group of a full professor, and is completed with a postdoctoral thesis (Fitzenberger and Schulze, 2014; Pritchard, 2006). In 2002 the German equivalent of assistant professorships (“Juniorprofessur”) was introduced as an alternative path to tenured professorships (Pritchard, 2006). Junior professorships can last up to six years and grant aspiring academics more independence than the habilitation (Fitzenberger and Schulze, 2014). In either case, aspiring professors would need to apply for tenured professorships after the completion of the habilitation or Junior professorship, because during the time period covered by the data

37This limit is set at the difference between the basic wage of W3 and B10 (another, non-professorial pay scheme), which was 5241.48 on 1 August 2004 (Detmer and Preissler, 2005)

55 for the empirical analysis, tenure-track positions generally did not exist. Furthermore, universi- ties were commonly not allowed to hire their own habilitands or Junior professors as tenured professors due to the so-called “home-hiring ban” (“Hausberufungsverbot”) (Academics.de, 2016). Hence aspiring professors would normally have to move to another university for a first tenured position, and they would start on a new employment contract. Attraction bonuses are, in principle, available for first-time tenured professors if their qualifications and expected academic success warrants such bonuses. Some states, however, stipulate that first-time tenured professors are to be offered the basic wage only, except in exceptional circumstances (Detmer and Preissler, 2006). In response, many professors who just acquired their first tenured position would immediately apply for other tenured positions so as to be able to negotiate an attraction or retention bonus (Detmer and Preissler, 2006). Professors are also generally required to show proof of another offer in order to be able to negotiate a retention bonus (Detmer and Preissler, 2005). The attraction and retention bonuses are thus implicit, market-based incentives. The president or rector of a university or the dean of the relevant faculty usually decides on the attraction or retention bonus (cf. e.g. Bergsdorf (2005); Leitungsgremium (2005); Huber (2005); Universitaet Regensburg (2016)).

A2.2 On-the-job Performance Bonuses

Most universities distinguish several performance levels that are associated with increasing on-the-job performance bonuses (the so-called “Leistungsstufen”) (Harbring, Irlenbusch and Kräkel, 2004; Kräkel, 2006; Lünstroth, 2011). The first performance level of the Regensburg University for instance requires performance to exceed fulfillment of (normal) professorial duties, the second level requires achievements that helped further the national standing of the university, while the third level is reserved for achievements that have improved the international reputation of the university (Universitaet Regensburg, 2016). The lowest performance level generally requires achievements that lie (substantially) above those in line with the ordinary fulfillment of professorial duties (cf. Huber (2005); Universitaet Regensburg (2016); Gien (2017)). The on-the-job performance bonuses thus constitute explicit performance incentives with characteristics akin to target setting or rank-order tournaments. These incentives should affect only academics that fall under the performance pay scheme, and only from the moment that the academics enter into the performance pay scheme. There is substantial variation in the number of performance levels across universities; the number of levels ranges from 2 (e.g. Augsburg and Erfurt University) to 10 (University of Trier), and the associated pay from 90 (Technical University of Berlin) to 2500 euro per month (e.g. Bielefeld and Bremen University) (Lünstroth, 2011). Generally speaking, the university president, rector or council announces either both the number of levels and associated bonus pay or only the total number of bonuses (if the bonus pay amounts are specified in the university’s statutes) to be awarded in a given year at the beginning of that year (Lünstroth, 2011). It is therefore difficult for academics to know, ex ante, at what university they may have a higher chance of earning on-the-job bonuses.

56 A2.3 Cost Neutrality

The academic pay reform was mandated to be cost-neutral. In particular, the average professorial pay at the federal and state level was to remain at the respective pre-reform levels38 (benchmark year 2001) (BMBF, 2002). In many states, the state’s ministry of education implements the cost-neutrality requirement by calculating university-specific professorial pay averages that are used as benchmark professorial pay average for the respective university going forward (Handel, 2005). The law does allow for the benchmark professorial pay average to be exceeded by, on average, 2% per year, though not exceeding 10% in total and as long as the state’s budget allows (BMBF, 2002). The budget-neutrality stipulation was explicitly introduced to prevent the reform leading to cost-cutting or a cost explosion (Detmer and Preissler, 2006; Handel, 2005). Because the basic wage in the performance pay system is lower than most of the age-related wages, the cost neutrality requirement guarantees that whatever is saved on basic wage payments, is paid as bonuses in the performance pay scheme. Handel (2005) calculates that, with a pre-reform professorial pay average of 71.000 euro at universities, about 26% of total professorial pay for university professors is available for performance pay bonuses39.

A3 Further Details on Data

In this appendix, I describe each of the three core input data sets separately, before providing a detailed description of the preparation and matching procedures used to generate the eventual individual-level panel data set used for the empirical tests in this paper. All data handling was done using Python, unless otherwise indicated.

A3.1 Further Details on DGK

Kuerschners Deutscher Gelehrten Kalender (DGK) is a bibliographic and bibliometric encyclo- pedia of academics affiliated with German, Austrian and Swiss universities. All people who have passed the "venia legendi" and are both actively teaching and researching at a relevant university in Germany, Austria and Switzerland are included in DGK. The "venia legendi" encompasses the "habilitation" (a post-doctoral qualification that is acquired through publication of a habilitation thesis after up to six years of research as part of a full professor’s research group ("Lehrstuhl")) and a qualification to teach at university level (the "Lehrbefugnis"). An exception to the venia legendi rule for inclusion in DGK are honorary professorships (Honorarprofessoren) and junior professorships (Juniorprofessoren). Universities considered relevant for DGK are generally those that can reward doctoral degrees ("Promotionsrecht"). This includes all public universities that I restrict attention to. Academics who move to a university outside of Germany, Austria or Switzerland are generally dropped from the encyclopedia, unless they personally request to

38The law does allow states to raise their target average professorial pay level to - at most - the highest average professorial pay at the state or federal level. 39For this calculation, Handel (2005) uses 2001/2002 data and assumes that the ratio of W2 to W3 professors at universities will be about the same as that of C3 to C4, namely 46:54.

57 remain included (Schniederjuergen, 2013a). People whose academic affiliation can no longer be verified are dropped from the encyclopedia too. The information in DGK stems from academic calendars, course rosters/teaching schedules, announcements of appointments by universities and in academic and professional journals, surveys, university websites, etc. (De Gruyter, 2006, 2008). De Gruyter Publishers, the current publishers of the DGK, have kindly supplied me with the editorial database underlying the online DGK edition (current up to 13 July 2013), as well as a copy of the exports from this database taken on 10-11-2006, 17-11-2008 and 27-09-2010. This database and its past exports contain all the information of published DGK editions from the same years (all records of people complying with the DGK inclusion criteria set out above), plus inactive records of people who left (German, Austrian or Swiss) academia, passed away or could no longer be traced), activation dates of records (the date when a person first complied with the DGK inclusion criteria and was taken up in the database) and inactivation dates, where applicable40.

A3.2 Further Details on FuL

Forschung und Lehre (FuL) is Germany’s largest higher education and research magazine that has been published monthly by the German higher education association (Deutscher Hochschul- verband) since 1994 (DHV, 2014). Every magazine contains a section titled "Habilitationen und Berufungen" with announcements of habilitations, the acquisition of the Lehrbefugnis (an authorization to lecture), and the receipt, acceptance or rejection of academic (professorial) positions. These notifications are based on information from press releases from universities, newspapers and professional magazines as well as from readers/individual scientists (DHV, 2002). Electronic copies of past Forschung und Lehre magazines from 1996 onwards can be downloaded from the "archive" section of the magazine’s website (DHV, 1999-2013). I use Forschung und Lehre magazines from 1999 to 2013 for the individual-level affiliation panel, so as to align with the years for which I have (activation) data from DGK. I first extract the text from the pdfs (electronic copies) of the Forschung und Lehre maga- zines and subsequently exploit the generally formulaic structure of the announcements in the “Habilitationen und Berufungen” section in FuL to extract the desired information regarding the timing and specifics of the habilitations and professorial offers. In the case of a habilitation and/or Lehrbefugnis announcement in FuL, the announcement generally mentions the university at which the Habilitation and/or Lehrbefugnis was obtained, as well as the respective field. Pro- fessorial offer ("Berufung") announcements generally mention an academic’s current university affiliation and title, the offer university and offered position (title and subject), as well as whether the offer was obtained (“erhalten”), accepted (“angenommen”), appointed (“ernannt”) or rejected (“abgelehnt). I use an extensive set of regular expressions to extract these relevant pieces of

40The DGK editorial database was started in 1996, when the DGK data were migrated from the previous publisher to De Gruyter (Schnieder- juergen, 2013b). The earliest activation dates in the database however appear to be 1999, and the affiliation histories used in this paper therefore start as of that year. The De Gruyter database is updated continuously and is used to generate the online version of the DGK. The DGK has an online version since 2010.

58 information from the announcements. For habilitation or Lehrbefugnis announcements, I extract the name and current title of the person concerned, the current affiliation of the person and, if different from the current affiliation, the university at which the qualification was obtained, the field in which the qualification was acquired, as well as the subject category under which the announcement was made in the FuL magazine. I take the month and year of the FuL issue in which the announcement was made to be the time when the qualification was obtained, backdated by four months to correct for the average printing lag. In the case of a professorial offer announcement, I record whether the offer was extended, accepted, appointed or rejected, the name and current title of the person concerned, the current affiliation of the person, the offer university, offered position and field in which the position is offered, as well as the subject category under which the announcement was made in the FuL magazine. In case of multiple offers, I always record accepted or appointed offers first, followed by offers that are obtained. I record rejected offers last. In case of only obtained offers, I record offers from German universities first, otherwise the order is random. For the offer announcements too, I take the month and year of the FuL issue in which the announcement was made to be the time when the qualification was obtained, backdated by four months to correct for the average printing lag41.

A3.3 Further Details on ISI

The ISI Web of Science database (hereafter: ISI) is compiled and maintained by Clarivate Analytics (and, before that, by Thomson Reuters) and can be accessed via the website webof- knowledge.com (Clarivate Analytics, 1993-2012a). From this database, I restrict attention to publications from the following databases: Science Citation Index Expanded (SCI-Expanded), Social Sciences Citation Index (SSCI), Arts and Humanities Citation Index (AHCI), Conference Proceedings Citation Index - Science (CPCI-S) and the Conference Proceedings Citation Index - Social Sciences & Humanities (CPCI-SSH). I downloaded all records of publications with at least one author with a German (work) address and published between 1993 and 2012 from the ISI website.

A3.4 Matching DGK and FuL

I make the information from FuL and DGK compatible by replacing university names in the FuL and DGK databases with unique identifiers, mapping titles and positions to a unified list of existing titles and positions, and classifying a title or position as being tenured or non-tenured. The following are tenured professorial positions: C3-Professor, W2-Professor, Ausserordentliche Professor and Associate Professor as well as C4-Professor, W3-Professor, Ordentliche Professor and U(niversitaets)-Prof. Furthermore, I classify all subject areas distinguished in DGK under 12 broad field categories. These are the fields distinguished in the ’Habilitationen und Berufungen

41Offers that were only reported as being obtained are not backdated, reflecting the fact that only appointments and offer acceptance or rejection are reported four months later on average.

59 section’ of FuL: theology; philosophy and history; social sciences; philology and cultural studies; law; economics; mathematics, physics and computer science; biology, chemistry, earth sciences and pharmaceutics; engineering; agricultural sciences, nutrition and veterinary medicine; medicine (human); dentistry. For example, the DGK subject areas ’rechtswissenschaft’ and ’rechtsgeschichte’ are mapped into FuL-field ’law’, while ’immunologie’ and ’molekulare medizin” are mapped into the FuL-field ’medicine’42. Subsequently, I distill a list of unique academics from both the FuL and DGK records. In order to do so, I match academics appearing in FuL with academics in DGK on the basis of their last name, subject area and initials, and subsequently deduplicate the list of academics on the basis of these same criteria. As last name, I use the name after the last space in the FuL name field, with potential hyphens of composite last names deleted (so e.g. Schmidt-Angel becomes SchmidtAngel). Composite first names are separated first and the first letter of all name components are taken to be initials (e.g. Anna-Maria has initials A, M). I match on initials rather than first names, because the publication records in ISI list initials only. Matching on initials, last name and subject area is therefore the best I can do across all three main input data sets. I require at least one of the FuL-field codes for the field or subject areas listed for an academic in DGK to be the same as the FuL-field code an academic is classified under in FuL. If an academic does not have a subject area listed in DGK or if this subject area cannot be classified under one of the FuL-field codes, a match is attempted on the basis of last name and initials only (but only if the subject area recorded in DGK could not be mapped to an FuL-field code or if no subject area was recorded in DGK). To improve matching accuracy, I discard a potential match if: a) a person’s last (most recent) announcement in FuL is made while a potential match in DGK is over 67 years old43 (based on birth year given in DGK) b) a potential DGK match has a date of passing that falls before the last (most recent) announcement year in FuL c) a potential DGK match is reported to have retired in DGK-year-x, while there are FuL announcements after year x d) a potential DGK match is reported as having a tenured position before the habilitation year reported in FuL44 As mentioned in the main text, 83% of academics who appeared as having a tenured affiliation at a German university in FuL can be matched to academics listed in DGK. Failed matches are mostly due to spelling inconsistencies or errors, which I resolve manually where possible.

42The full mapping is available from the author upon request. 43German law mandates that academics retire at the age of 65 (Mohr, 2007; Bundesgesetzblatt, 1985), so unless an academic moves abroad around the time of mandated retirement in Germany (cf. Mohr, 2007), I should not observe FuL announcements regarding new affiliations for an academic who is past the age of retirement. I use 67 as cut-off instead of 65 to allow for some delay in a possible move abroad or FuL’s reporting thereof. 44Where I allow for the habilitation announcement to occur in the year after tenure, to accommodate cases in which an academic obtains a tenured position immediately upon passing the habilitation while the habilitation announcement is delayed.

60 A3.5 Matching Academics with Publications

I match publications from ISI to the deduplicated list of academics appearing in FuL and DGK on the basis of last name, initials and subject area. To enable matching on subject area, I map the Web of Science categories listed for journals to the aforementioned 12 FuL-field codes. The Web of Science categorizes ’Ethics’ and ’History” are mapped into the FuL-field ’philosophy and history’ for instance, while the categories ’Economics’ and ’Industrial Relations & Labor’ are mapped to the FuL-field ’economics’45. Furthermore, to deal with differences in the way last names are represented across sources, I abstract from common prefixes such as ’von’ and ’von der’. I match publications to academics using subsequent sets of criteria. I first try to match publications to academics on the basis of the last name, all initials and field or subject area. If multiple academics can be matched to a publication on the same criteria, the publication is attributed to all matched academics46. If there are no matches on last name, initials and field, I attempt a match on a slightly different set of criteria: last name with any spaces and hyphens removed, initials and field or subject. If no matches are found still, I move on to the next set of criteria, and so on. The subsequent sets of criteria are: last name, first initial only and field or subject; last name without spaces and hyphens, first initial only and field or subject; first last name (if composite last name such as “Gross Herzenberg”, for male academics only), all initials and field or subject; first last name (if composite last name such as “Gross Herzenberg”, for male academics only), first initial only, and field or subject; second last name (if composite last name such as “Schmidt-Bauer”, for female academics only), all initials and field or subject; second last name (if composite last name such as “Schmidt-Bauer”, for female academics only), first initial only and field or subject. If still no match has been found, I attempt to match the publication to academics who do not have a field or subject code47, on the basis of (in order): last name and all initials; last name without spaces and hyphens and all initials; last name and first initial only; last name without spaces and first initial only.

A3.6 Creating an Individual-Level Panel

The starting point for the individual-level panel of affiliations over time are the FuL announce- ments. I cross-check and, where necessary, supplement these with information from DGK. For any FuL offer announcement, the current university of a person, their current position (title) and whether this concerns a tenured affiliation is filled back in time from the year before the FuL announcement year to the year that FuL reported as the year in which the person passed

45The full mapping is available from the author upon request. 46Note, however, that I discard academics that share the same last name, initials and field or subject from the eventual analysis sample so as to reduce the measurement error from double-counting of publications. 47This is the case if the subject area recorded in DGK could not be mapped to an FuL field code or if no subject area was recorded in DGK

61 their habilitation or Lehrbefugnis, or the start year of the panel – whichever is earlier48,49. If the FuL announcement concerns an accepted offer, the new university, new position (title) and whether the position is tenured or not is filled forward from the year of the FuL announcement to the last year of the panel50. If the FuL announcement concerns an appointment (“ernannt”), an academic’s current university is taken to be the offer university (i.e. an appointment to a different position within the same university) unless the offer university is specifically stated to be different from an academic’s current university. The offer university, offered position (title) and whether the position is tenured is filled forward in the same way as with an announcement of an accepted offer. If the FuL announcement states that an offer was rejected, the current university, current position and whether the position is tenured or not is filled forward as above. Finally, for an announcement of a received offer (“erhalten”), the information regarding the offer university, position and whether the position is tenured is filled forward tentatively, with the option to overwrite if new information arrives that supersedes it. From each DGK edition, I extract both the affiliation information, other career and personal information provided by the editors of DGK, as well as self-reported career information provided by academics and listed in one of the DGK data fields. To extract useful information from the self- reported career information field, I again exploit regularities in the structure of the information pieces with an extensive set of regular expressions. I discard information pieces that have no (valid) start date, university name or title. I also discard information about temporary positions, such as visiting positions. Starting from the last (most recent) piece of self-reported career information going backward, I fill out the affiliation information contained in this piece from the year listed as the start date of the position to the end date (if listed), the start date of a new (more recent) position, or publication date of the current DGK edition, provided one of the following conditions is met: 1) the information piece is the last piece in the self-reported career information field, and the information matches the affiliation information provided by DGK editors, 2) there is already affiliation information for a particular year and that information is derived from a self-reported piece in the current DGK edition or it concerns affiliations information provided by DGK editors in a previous DGK edition, or 3) there is already affiliation information for a particular year and that information stems from an FuL announcement made before the publication date of the current DGK edition. Next, I fill out the editor-provided affiliation information, starting from the publication year of the DGK edition and going backward. I do not overwrite affiliation information that stems from an FuL announcement in the same year as the DGK edition or the year thereafter. I also do not overwrite affiliation information derived

48If an affiliation is already filled out in a year before the offer announcement, the current position is checked for consistency with the affiliation already recorded in the panel. If the recorded affiliation is incomplete (e.g. contains only a title or university) but matches the new information, the partial affiliation information is supplemented with the new information. If the two do not match up, the current FuL information is filled out for the year immediately prior to the FuL announcement only, an error message is created and the case is left for further, case-by-case evaluation. 49Except for FuL announcements of offers that have been extended only (i.e. not yet accepted, rejected or appointed), the start date of the corresponding affiliation (or obtainment of habilitation or Lehrbefugnis) is backdated by 4 months to correct for the average lag in reporting of this information. The announcement of an accepted offer in e.g. February 2003 is thus interpreted as though the new position was obtained in 2002. 50Note that this information is overwritten when new information, from a later announcement, arrives. There is thus chronological updating.

62 from an FuL announcement when the DGK affiliation matches the previous affiliation listed in the FuL announcement (this suggests DGK is not up-to-date for the academic). Furthermore, I use the editor-provided information to correct any mislabeling in FuL of honorary or temporary professorial positions as tenured positions. In a last step, I make sure that any earliest affiliations are filled backward until an academic received their habilitation or Lehrbefugnis, or otherwise became active in German academia and I delete any affiliation information before these starting points. Similarly, I fill any latest affiliations forward until the affiliation’s end year (if listed in DGK), year of passing of the academic, or year in which the academic otherwise left German academia, and I delete any affiliation information filled out beyond these years. Finally, I drop any affiliation information for years that fall outside of the period for which I have affiliation information from DGK and FuL (1999-2013).

A4 Further Empirical Analyses

A4.1 Synthetic Cohorts

As a further test of the validity of the identifying assumption, and the causal interpretation of the productivity differentials between the treated and control cohort as the effort effect of the higher-powered incentives of the performance pay reform, I estimate the baseline regression with synthetic cohorts, which are defined by the average age at which professors at German public universities start their first tenured affiliation. This average tenure age is 44. The synthetic treatment cohort then comprises academics whose synthetic first tenure year falls between 2005 and 2007, while the synthetic control cohort is made up of academics whose synthetic first tenure year falls between 2002 and 2004. I calculate the synthetic first tenure year by adding 44 to an academic’s synthetic age, which is equal to an academic’s birth year if this is known, and a synthetic birth year otherwise. In turn, the synthetic birth year is derived from the year in which academics pass their habilitation or, if I don’t have this information, the year in which academics receive their PhD, by adding the average age at which academics who become tenured professors at public universities pass their habilitation or receive their PhD, respectively51. Panel D in Appendix Table A.2 shows that the results are qualitatively similar with synthetic cohorts; with increases in raw and impact-factor weighted number of publications of 10.4% to 15.7% and a decrease in average impact of 12.4% (all significant at 5%) as of the moment the higher-powered implicit incentives of the performance pay reform take effect.

A4.2 Instrumental Variables Approach

An alternative way to estimate the effort effect is to focus on switchers instead; estimating the baseline specification for academics who hold a tenured affiliation before the reform and labeling

51These average ages are 37 and 30, respectively.

63 those that switch to performance pay as the treated group. The assignment to this treatment group is however endogenous, and I therefore use an instrumental variables approach. Figure A.1 shows that older academics earn a higher basic wage under the age-related pay system; specifically when they are 33 or 43 years of age or older. Age and age-related variables may therefore have explanatory power for selection into performance pay and could thus potentially serve as instruments. I use two sets of age-related variables as instruments: Age and age-squared (Panel A in Table A.8), and indicator variables that are equal to one if an academic is 33 and 43 years of age or older (Panel B in Table A.8), respectively. Because I do not have age information for all academics, I construct a synthetic age variable for which I impute unknown ages using the average age at PhD, habilitation or tenure. I estimate an instrumented version of the baseline specification, with 0 Post 02 and Contract−Change interacted with a Treatment dummy as the interaction variables of interest. The Treatment dummy is one for academics who held a tenured affiliation at a public university before implementation of the pay reform, and who change affiliation, position or contract after implementation. This variable is endogenous. Contract−Change is one in the year in which a position, affiliation or contract change happens, as well as all following years. If this contract change happens after 2005, it coincides with the moment an academic enters into the performance pay scheme and, as such, it is the equivalent to the Tenured vari- able in the baseline specification. Both the Treatment dummy and Contract−Change variable are possibly endogenous. I follow Aghion, Howitt and Mayer-Foulkes (2005) in instrument- ing for the endogenous terms with interacted instruments. That is, I use Synthetic Age and Synthetic Age Squared, interacted with the Post002 and Post005 variables, as instruments for 0 Post 02 ∗ Treatment and Contract−Change ∗ Treatment. I estimate the resulting instrumented panel fixed effects model using the two-stage efficient GMM estimator with robust standard errors clustered at the individual level52.53 The resulting estimates of the effort effect are qualitatively similar to the baseline results presented in the main text; with increases in total (quality-adjusted) research output and impact, and - imprecisely estimated - decreases in average impact54. Furthermore, the magnitude of the effect estimates is much larger than in the equivalent panel fixed effects model (Panel C in Table A.8). The latter is likely at least in part because the instruments appear not to be valid at least in the age and age-squared instrumented total quantity and total impact regressions: the Hansen J statistic suggests the instruments are not uncorrelated with the error term in columns 1 through 3 of Panel A. Indeed, it is entirely plausible that age affects productivity not just through

52I use the 2-step GMM estimator so as to derive efficient estimates in the presence of arbitrary heteroskedasticity and clustering (Wooldridge, 2010). I estimate the model as a linear IV (panel fixed effects) model to be able to perform a number of IV diagnostic tests, even though the first stage would be most appropriately estimated as a hazard rate model and the second stage as a Poisson model. 53To align the sample with that used for the effort effect estimation in the main text, I restrict attention to academics who started their first tenured affiliation in 2002, 2003 or 2004, and who did not hold a foreign affiliation immediately preceding this. Resuls are however robust to including all academics who started their first tenured affiliation in 1999-2004, with no foreign affiliation directly prior (results available from the author on request). 54 0 These results are robust to substituting Post 05 ∗ Treatment for Contract−Change ∗ Treatment in the second stage. Results are also robust to estimation as 2SLS instead of GMM, though the former is not efficient to heteroskedasticity, and hence not preferred (or reported) here. Both sets of robustness results are available from the author upon request.

64 its effect on the likelihood of switching to performance pay, but directly as well. Furthermore, the instrumented interaction terms appear not to be endogenous in the average quality or impact regressions throughout and in the over-32 and over-42 instrumented quality adjusted and total impact regressions. Because of these misspecification concerns, these instrumental variables estimation results should be interpreted with extreme caution, and taken to be indicative, at best.

A4.3 Tenure Probability

The effort effect estimates may be biased if the probability of obtaining a tenured affiliation changes with the reform. In particular, I have to rule out that tenure requirements go up, as this would lead to only relatively more productive academics obtaining tenured positions after implementation of the reform. To this end, I estimate the tenure probability using hazard rate analysis, much like the analysis of switching rates in the selection effect section in the main text of the paper (Section 4.3). Here the event of interest is the start of the first tenured affiliation55, and academics are “at risk” of obtaining a tenured affiliation after completion of the habilitation. For academics for whom the habilitation year is unknown, I impute it using the average age at completion of the habilitation. I estimate Weibull proportional hazard models of the tenure probability as a function of synthetic age (defined as above) and productivity and the interactions of those variables with the Post005 trend-break variable, controlling for academic field. As productivity variables I use the two-year lag (Panel A in Table A.9) and one-year lag (Panel B in Table A.9), respectively, of the number of publications (columns 1a and 1b) and the impact factor-rated number of publications (columns 2a and 2b). As expected, a higher productivity increases the probability of obtaining a first tenured affiliation in general, but there is no additional increase after implementation of the reform (compare the coefficient estimates of the Productivity and Post005 ∗ Productivity variables). That is to say, there is no evidence that the requirements for becoming a tenured professor increase with the reform, and hence there is no evidence that academics who start their first tenured position after the reform are more productive. Results are robust to weighting the productivity variables by number of authors, or using productivity averages over the pre-reform years 2002-2004 (results available from the author upon request).

55To align with the preceding analyses, I restrict attention here to first tenured affiliations when the preceding affilliation is not foreign, though results are robust to including all first observations of tenured positions (results available from the author upon request).

65 Figure A.1: Comparison of Basic Wage Schedules Notes: The figure above shows the monthly wages (in euros) by age for the various pay levels in the age-related (C) and performance pay (W) schemes. The depicted wages were valid as of 1 August 2004 in former West-German states; the corresponding monthly wages in former East-German states were 92.5% of these (Detmer and Preissler, 2006). Data source: Oeffentlicher Dienst (2004).

66 Figure A.2: Extensive Margin Response Notes: Results from separate logit regressions of the probability that an academic has at least one publication in a given year. Samples are restricted to, respectively, the top five productivity deciles (P90-P50) and below median productivity academics (M1). Productivity deciles are determined on the basis of the averages of the impact factor-rated number of publications over the three pre-announcement years 1999, 2000 and 2001, separately by academic field and treatment group. All other specifications as before

67 (a) Number of Publications (b) Total Impact Factor Rating

(c) Total Number of Citations

Figure A.3: Productivity Metrics on the Intensive Margin Notes: Estimation results of separate regressions by productivity quantile with the following conditional dependent variables (conditional on having at least one publication): number of publications, impact factor-rated number of publications, total number of citations to all publications published in a given year. Samples are restricted to, respectively, the top five productivity deciles (P90-P50) and below median productivity academics (M1). All other specifications as before.

68 (a) 10th (top) Decile (b) 9th Decile

(c) 8thDecile (d) 7th Decile

(e) 6thDecile (f) Below Median

Figure A.4: Heterogeneous Results for Backward Cosine Similarity Notes: Results from separate regressions with backward cosine similarity quantile frequency variables as dependent variable in each sub-figure. Samples are restricted to, respectively, the top five productivity deciles (sub-figures a-e) and below median productivity academics (sub-figures f). All other specifications as before.

69 (a) 10th (top) Decile (b) 9th Decile

(c) 8thDecile (d) 7th Decile

(e) 6thDecile (f) Below Median

Figure A.5: Heterogeneous Results for Forward Cosine Similarity Notes: Results from separate regressions with forward cosine similarity quantile frequency variables as dependent variable in each sub-figure. Samples are restricted to, respectively, the top five productivity deciles (sub-figures a-e) and below median productivity academics (sub-figures f). All other specifications as before.

70 (a) Citation-Based Quantiles (b) Weighted Impact Factor Rating Based Quantiles

Figure A.6: Smoothed Hazard Rate Curves Notes: Epanechnikov kernel-density estimates of the hazard function for switching to the performance pay scheme for academics in the top quartile (red line) and bottom three quartiles (blue line) of the average productivity distribution. Quartiles are determined based on three pre-implementation year averages (2002, 2003 and 2004) of the sum of citations to publications (sub- figure a) and the impact factor-rated publications weighted by number of authors (sub-figure b). Quartiles are derived separately by academic field and tenure year. The sample is restricted to academics who held a tenured affiliation at a public university before 2005.

71 Table A.1: Additional Baseline Results

Panel A: Maximum and Minimum Panel B: Average Number of

Number of Citations Co-authors and Pages

Maximum citations Minimum citations Co-authors Pages

Post’02 * Treatment 0.001 -0.176 0.054 -0.051

(0.066) (0.136) (0.155) (0.042)

Tenure * Treatment -0.063 -0.058 -0.213 0.029

(0.078) (0.171) (0.183) (0.036)

Number of Observations 47789 45928 47092 48097

Number of Individuals 4110 3967 3933 4216

Log Likelihood -2366039.795 -525485.210 -268258.275 -131295.152

Chi-squared 430.387 457.933 426.031 176.011

Notes: The dependent variables in Panel A are, respectively, the maximum and minimum number of citations to the publications of an academic in a given year. In panel B, the dependent variables are the average number of co-authors on publications and the average number of pages of publications, respectively, where the average is taken over all publications of an academic in a given year. All other specifications are as in the baseline estimation of the effort effect.

72 Table A.2: Robustness Checks - Alternative Models

Panel A: Linear FE Model # Publications IF-rated publications Citations Average IF-rating Average citations

Post’02 * Treatment 0.300** 0.875 7.634 -0.216** -3.561*

(0.120) (0.594) (6.770) (0.093) (2.160)

Tenure * Treatment -0.197 -0.489 -3.141 0.069 -0.421

(0.166) (0.815) (8.319) (0.102) (2.365)

Number of Observations 108363 108363 108363 48552 48552

Number of Individuals 6039 6039 6039 4671 4671

Log Likelihood -294977.403 -471443.345 -740680.174 -102809.237 -262588.501

Panel B: Publication Variables Weighted by Number of Authors

Post’02 * Treatment 0.177*** 0.110** 0.111 -0.099*** -0.105

(0.036) (0.053) (0.072) (0.037) (0.068)

Tenure * Treatment 0.015 0.117* 0.117 0.028 0.011

(0.042) (0.064) (0.079) (0.040) (0.077)

Number of Observations 83937 74326 78308 47052 47789

Number of Individuals 4671 4136 4357 3917 4110

Log Likelihood -64857.187 -109658.385 -1074350.682 -70155.980 -689216.004

Chi-squared 1941.485 1785.932 771.481 417.241 606.337

73 Table A.2: Robustness Checks - Alternative Models

Panel C: Without Switchers

Post’02 * Treatment 0.207*** 0.178*** 0.158** -0.093** -0.103

(0.040) (0.055) (0.067) (0.040) (0.066)

Tenure * Treatment -0.009 0.004 0.001 0.021 -0.012

(0.044) (0.059) (0.074) (0.042) (0.075)

Number of Observations 73272 64920 68380 41135 41764

Number of Individuals 4078 3613 3805 3421 3579

Log Likelihood -118583.219 -296315.765 -3973456.609 -62391.047 -656767.678

Chi-squared 2508.109 2425.435 933.761 519.380 402.585

Panel D: Synthetic Cohorts

Post’02 * Treatment 0.099** 0.146** 0.027 -0.022 -0.132**

(0.041) (0.059) (0.069) (0.031) (0.055)

Tenure * Treatment -0.029 -0.046 -0.028 0.042 0.074

(0.044) (0.064) (0.073) (0.033) (0.059)

Number of Observations 36548 31116 33204 20975 21403

Number of Individuals 2033 1731 1847 1636 1752

Log Likelihood -60018.452 -148221.305 -1863790.551 -31268.893 -287069.199

Chi-squared 1291.025 1200.136 524.503 328.138 269.700

Notes: Panel A presents the results of estimation of the baseline specification as a panel fixed effects model. In panel B, the dependent variables are weighted by the number of authors on a publicationn. In Panel C, the control group is restricted to only those academics that do not switch to the performance pay scheme, where any first affiliation, position or contract change after implementation of the pay reform (as of 2005) is considered a switch. In Panel D, assignment to the treatment and control cohorts is not based on the actual first tenured year, but based on the average age at which academics start their first tenured affiliation. In Panel C, the control group is restricted to only those academics that do not switch to the performance pay scheme, where any first affiliation, position or contract change after implementation of the pay reform (as of 2005) is considered a switch. In Panel D, assignment to the treatment and control cohorts is not based on the actual first tenured year, but based on the average age at which academics start their first tenured affiliation. All other specifications are as before.

74 Table A.3: Further Robustness Checks

Panel A: Publication Variables Restricted to Articles Only

Post’02 * Treatment 0.165*** 0.129** 0.130** -0.092** -0.110*

(0.040) (0.055) (0.061) (0.039) (0.056)

Tenure * Treatment -0.046 -0.033 -0.026 0.025 -0.033

(0.041) (0.053) (0.063) (0.040) (0.060)

Number of Observations 81057 72316 77308 44499 45106

Number of Individuals 4510 4024 4301 3790 3974

Log Likelihood -120114.906 -278455.246 -4048814.424 -66542.784 -679164.726

Chi-squared 2556.845 3229.310 1158.145 897.142 452.101

Panel B: Inverse Hyperbolic Sine Transform Specification

Post’02 * Treatment 0.082*** 0.074*** 0.138*** -0.045** -0.040

(0.020) (0.028) (0.049) (0.022) (0.049)

Tenure * Treatment -0.021 -0.008 0.013 0.006 0.041

(0.024) (0.031) (0.054) (0.023) (0.055)

Number of Observations 108363 108363 108363 48552 48552

Number of Individuals 6039 6039 6039 4671 4671

Log Likelihood -95546.352 -129072.048 -192127.937 -32323.396 -73172.001

Chi-squared 2927.707 2987.478 2115.081 1454.155 540.056

Panel C: With absolute(time-to-tenure) instead of time-to-tenure dummies

Post’02 * Treatment 0.166*** 0.166*** 0.180*** -0.037* -0.023

(0.031) (0.041) (0.047) (0.022) (0.043)

Tenure * Treatment -0.042 -0.038 -0.041 -0.025 -0.045

(0.034) (0.044) (0.051) (0.025) (0.049)

Number of Observations 83937 74326 78308 47052 47789

Number of Individuals 4671 4136 4357 3917 4110

Log Likelihood -136676.95 -338396.973 -4510367.867 -70699.505 -736831.695

Chi-squared 2791.631 2771.287 1091.63 578.407 446.108

Panel D: With Post’05*interaction

Post’02 * Treatment 0.158*** 0.132*** 0.119* -0.092*** -0.076

(0.035) (0.049) (0.063) (0.032) (0.065)

Post’05 * Treatment 0.006 0.028 0.037 0.021 -0.070

(0.037) (0.051) (0.064) (0.032) (0.065)

Number of Observations 83937 74326 78308 47052 47789

Number of Individuals 4671 4136 4357 3917 4110

Log Likelihood -136647.402 -338246.655 -4508120.087 -70677.188 -736134.759

Chi-squared 2860.852 2846.786 1100.804 608.041 478.565

75 Table A.3: Further Robustness Checks

Panel E: 4-Year Treatment and Control Groups

# Publications IF-rated publications Citations Average IF-rating Average citations

Post’02 * Treatment 0.138*** 0.093** 0.066 -0.064** -0.094*

(0.034) (0.046) (0.055) (0.031) (0.053)

Tenure * Treatment -0.005 0.027 0.081 0.018 0.032

(0.035) (0.046) (0.056) (0.031) (0.059)

Number of Observations 104987 93038 97636 57688 58553

Number of Individuals 5842 5177 5432 4888 5115

Log Likelihood -167728.232 -416551.362 -5601408.411 -86337.296 -907781.141

Chi-squared 3541.787 3438.503 1344.787 756.156 574.359

Panel F: 2-Year Treatment and Control Groups

Post’02 * Treatment 0.118* 0.028 0.015 -0.107* -0.059

(0.068) (0.091) (0.105) (0.059) (0.125)

Tenure * Treatment -0.076 -0.018 0.012 0.058 -0.05

(0.061) (0.082) (0.098) (0.056) (0.102)

Number of Observations 46810 41180 43562 25910 26341

Number of Individuals 2604 2291 2423 2175 2287

Log Likelihood -76523.381 -187507.992 -2535716.145 -38137.036 -401924.46

Panel G: Controlling for 7 Years Before to 8 Years after Tenure, Less Pre-Tenure Year

Post’02 * Treatment 0.147*** 0.115** 0.107 -0.108*** -0.115*

(0.043) (0.059) (0.070) (0.038) (0.070)

Tenure * Treatment -0.016 0.026 0.017 0.033 -0.013

(0.040) (0.052) (0.064) (0.036) (0.067)

Number of Observations 83937 74326 78308 47052 47789

Number of Individuals 4671 4136 4357 3917 4110

Log Likelihood -136646.313 -338239.390 -4508059.852 -70680.751 -736200.571

Chi-squared 2865.820 2854.092 1098.897 607.815 475.542

Panel H: Controlling for 6 Years Before to 6 Years after Tenure

# Publications IF-rated publications Citations Average IF-rating Average citations

Post’02 * Treatment 0.168*** 0.131*** 0.161*** -0.071** -0.044

(0.038) (0.050) (0.061) (0.034) (0.062)

Tenure * Treatment -0.01 0.03 0.018 0.026 -0.018

(0.041) (0.053) (0.064) (0.036) (0.067)

Number of Observations 83937 74326 78308 47052 47789

Number of Individuals 4671 4136 4357 3917 4110

Log Likelihood -136647.71 -338272.363 -4509293.616 -70684.686 -736633.658

Notes: In Panel A, the dependent variables are based on a restricted set of publications; only including journal articles and proceedings

76 papers (ISI web of science categories: “Article”, “Article: Book”, “Article: Book Chapter”, “Article: Proceedings Paper”, “Proceedings Paper”). Panel B shows the results of the estimation of the baseline specification as a fixed effects panel data model with the inverse hyperbolic sine transformation of the dependent variables as dependent variables. In panel C, the time-to-tenure fixed effects in the baseline specification are replaced by an absolute time-to-tenure variable, which is 0 in the first year of the first tenured position, 1 both in the year before and after, and so on. In Panel D the Tenure ∗ Treatment interaction is substituted for a Post005 ∗ Treatment interaction to capture the effect of the explicit performance incentives, where Post005 is 1 as of 2005. In Panel E, the treatment group is comprised of academics who start their first tenured affiliation in 2004-2008 and the control group of academics who start their first tenured affiliation in 2001-2004. In Panel F, the control group starts their first tenured affiliation in 2003 or 2004, and the treatment group in 2005 and 2006. In Panel G the time-to-tenure fixed effect of the tenure year is dropped, while a time-to-tenure fixed effect for the eight year after tenure added. In Panel H, 13 time-to-tenure fixed effects are included instead of 15; from 6 years before tenure to 6 years after. All othre specifications as before.

77 Table A.4: Natural and Applied Science vs. Social Sciences and Humanities - Interactions

# Publications IF-rated publications Citations Average IF-rating Average citations

Post’02 * Treatment 0.260** -0.127 0.027 -0.272*** -0.185

(0.102) (0.168) (0.196) (0.101) (0.167)

Post’02 * Treatment * Nat. Apl Sci. -0.131 0.238 0.080 0.188* 0.085

(0.114) (0.184) (0.211) (0.111) (0.181)

Tenure * Treatment -0.019 0.168 -0.080 0.187* -0.119

(0.112) (0.165) (0.185) (0.102) (0.143)

Tenure * Treatment * Nat. Apl Sci. 0.016 -0.133 0.102 -0.172* 0.090

(0.098) (0.148) (0.172) (0.095) (0.135)

Number of Observations 80953.000 71540.000 75504.000 45588.000 46324.000

Number of Individuals 4505.000 3981.000 4201.000 3771.000 3966.000

Log Likelihood -132161.984 -326341.640 -4342991.519 -68490.957 -710222.214

Chi-squared 2910.339 2968.235 1113.439 669.440 480.945

Notes: Results are from the estimation of the baseline model augmented with a dummy variable for “Natural and Applied Sciences”, Nat.Apl. Sci., and its double and triple interactions with Post002, Tenure and Treatment. Mathematics, physics and informatics, biology, chemistry, earth sciences, pharmacology, engineering, medicine, dentistry, veterinary, agricultural science and nutrition science are classified as natural and applied sciences. The omitted category, social sciences and humanities, comprises theology, philosophy and history, philology and anthropology, law, economics and other social sciences. All other specifications as before.

78 Table A.5: Heterogeneous Results - Robustness

Panel A: Without Switchers

# Publications IF-rated publications Citations Average IF-rating Average citations

Post’02 * Treatment 0.302*** 0.331*** 0.309** -0.086 -0.014

(0.078) (0.128) (0.141) (0.088) (0.120)

Post’02 * Treatment * Top Decile -0.167* -0.196 -0.109 0.012 0.021

(0.094) (0.145) (0.166) (0.111) (0.151)

Post’02 * Treatment * 9th Decile -0.327*** -0.461*** -0.387** -0.049 -0.099

(0.105) (0.156) (0.187) (0.108) (0.169)

Post’02 * Treatment * 8th Decile -0.113 -0.271* -0.556*** -0.051 -0.407**

(0.109) (0.162) (0.198) (0.106) (0.201)

Post’02 * Treatment * 7th Decile 0.036 0.028 -0.068 -0.075 -0.323*

(0.115) (0.167) (0.195) (0.146) (0.195)

Post’02 * Treatment * 6th Decile 0.040 0.274 0.301 0.272** 0.331*

(0.142) (0.282) (0.278) (0.124) (0.188)

Tenure * Treatment -0.025 0.059 0.021 0.064 -0.087

(0.084) (0.123) (0.148) (0.083) (0.121)

Tenure * Treatment * Top Decile -0.006 -0.105 -0.171 -0.087 -0.070

(0.086) (0.123) (0.155) (0.103) (0.148)

Tenure * Treatment * 9th Decile 0.131 0.063 0.147 -0.077 0.134

(0.108) (0.152) (0.198) (0.096) (0.156)

Tenure * Treatment * 8th Decile 0.078 0.058 0.216 -0.037 0.166

(0.098) (0.136) (0.181) (0.093) (0.170)

Tenure * Treatment * 7th Decile -0.012 -0.160 -0.139 0.062 0.239

(0.111) (0.151) (0.214) (0.136) (0.193)

Tenure * Treatment * 6th Decile 0.047 -0.074 -0.009 -0.214** -0.092

(0.129) (0.228) (0.254) (0.108) (0.177)

Number of Observations 70414.000 62260.000 65702.000 39741.000 40369.000

Number of Individuals 3919.000 3465.000 3656.000 3281.000 3441.000

Log Likelihood -113340.774 -279883.526 -3761432.559 -60249.603 -629560.855

Chi-squared 2896.039 2925.687 1302.279 584.031 435.866

79 Table A.5: Heterogeneous Results - Robustness

Panel B: Deciles Based on Number of Citations

# Publications IF-rated publications Citations Average IF-rating Average citations

Post’02 * Treatment 0.301*** 0.314** 0.390*** -0.127* -0.054

(0.072) (0.122) (0.137) (0.075) (0.121)

Post’02 * Treatment * Top Decile -0.181** -0.208 -0.191 0.016 0.097

(0.088) (0.137) (0.157) (0.092) (0.152)

Post’02 * Treatment * 9th Decile -0.204** -0.257* -0.473*** 0.123 -0.134

(0.103) (0.155) (0.181) (0.091) (0.176)

Post’02 * Treatment * 8th Decile -0.165 -0.248 -0.514*** 0.026 -0.227

(0.101) (0.154) (0.178) (0.099) (0.163)

Post’02 * Treatment * 7th Decile -0.246** -0.248 -0.418** 0.116 -0.051

(0.114) (0.166) (0.197) (0.101) (0.160)

Post’02 * Treatment * 6th Decile -0.182 -0.271 -0.273 0.048 -0.002

(0.114) (0.187) (0.208) (0.127) (0.189)

Tenure * Treatment -0.081 0.023 0.001 0.108 0.029

(0.081) (0.121) (0.139) (0.073) (0.120)

Tenure * Treatment * Top Decile 0.046 -0.025 -0.075 -0.066 -0.140

(0.081) (0.119) (0.143) (0.085) (0.139)

Tenure * Treatment * 9th Decile 0.147 0.050 0.024 -0.186** -0.144

(0.095) (0.142) (0.174) (0.081) (0.152)

Tenure * Treatment * 8th Decile 0.181* 0.090 0.190 -0.095 0.067

(0.096) (0.132) (0.174) (0.084) (0.145)

Tenure * Treatment * 7th Decile 0.097 -0.009 0.045 -0.085 -0.097

(0.104) (0.144) (0.185) (0.097) (0.166)

Tenure * Treatment * 6th Decile 0.039 -0.019 0.088 -0.162 -0.123

(0.107) (0.167) (0.196) (0.112) (0.168)

Number of Observations 80953.000 71540.000 75504.000 45588.000 46324.000

Number of Individuals 4505.000 3981.000 4201.000 3771.000 3966.000

Log Likelihood -131506.164 -324593.313 -4269946.815 -68478.753 -705212.935

Chi-squared 3132.735 2988.259 1520.744 715.053 543.353

Notes: Results are from the estimation of the baseline model augmented with a dummy variables for productivity deciles and their double and triple interactions with Post002, Tenure and Treatment. In panel A, the specification includes interactions with indicator variables for the five top deciles, where these productivity deciles are determined on the basis of the averages of the impact factor-rated number os publications over the three pre-announcement years 1999, 2000 and 2001, separately by academic field and treatment group. Furthermore, the control group in this panel is restricted to only those academics that do not switch to the performance pay scheme, where any first affiliation, position or contract change after implementation of the pay reform (as of 2005) is considered a switch. In Panel B, the productivity deciles are determined on the basis of the averages of the sum of citations to publications over the three pre-announcement years 1999, 2000 and 2001, separately by academic field and treatment group. All specifications control for year and individual fixed effects and fifteen time-to-tenure fixed effects (from seven years before the tenure year to seven years after).

80 Table A.6: Selection Analysis - Robustness Checks

Treatment vs Control Placebo

Panel A: Above vs Below Median 1a 1b 1c 2a 2b 2c

Treatment -0.226*** -0.060 -0.673 -0.035 -1.789*** -1.502*

(0.048) (0.430) (0.537) (0.064) (0.642) (0.877)

Above median 0.154*** 0.029 -2.456*** 0.278*** 0.264*** -0.221

(0.050) (0.066) (0.683) (0.071) (0.091) (0.811)

Age -0.296*** -0.295*** -0.323*** -0.416*** -0.449*** -0.454***

(0.015) (0.017) (0.016) (0.036) (0.040) (0.041)

Age * Treatment -0.006 0.008 0.037*** 0.031*

(0.009) (0.012) (0.013) (0.018)

Above median * Treatment 0.262*** 1.938** 0.033 -0.594

(0.085) (0.880) (0.123) (1.202)

Above median* Age 0.056*** 0.011

(0.015) (0.018)

Above median * Age * Treatment -0.039** 0.012

(0.019) (0.025)

Age at Tenure 0.186*** 0.188*** 0.195*** 0.283*** 0.297*** 0.297***

(0.016) (0.016) (0.015) (0.036) (0.038) (0.038)

Constant 1.752*** 1.634*** 2.586*** 1.769*** 2.551*** 2.776***

(0.259) (0.348) (0.401) (0.376) (0.457) (0.599)

Number of Observations 80131 80131 80131 51431 51431 51431

Number of Subjects 14972 14972 14972 6960 6960 6960

Number of Switches 2435 2435 2435 1099 1099 1099

Log Likelihood -7409.954 -7404.629 -7394.647 -3179.231 -3175.077 -3174.110

Chi-squared 1177.857 1303.095 1404.077 559.784 571.198 561.794

Rho 1.633 1.640 1.655 2.438 2.525 2.525

81 Table A.6: Selection Analysis - Robustness Checks

Panel B: Cox Proportional Hazard Model Panel C: Baseline with Field Strata

1a 1b 1c 1a 1b 1c

Treatment -0.166*** 1.304*** 1.214*** -0.237*** 0.152 -0.114

(0.050) (0.409) (0.415) (0.048) (0.426) (0.438)

Age -0.128*** -0.113*** -0.120*** -0.298*** -0.299*** -0.314***

(0.019) (0.019) (0.020) (0.015) (0.017) (0.017)

Avg Productivity 0.002*** -0.001 -0.036*** 0.001 -0.004* -0.055***

(0.001) (0.002) (0.012) (0.001) (0.002) (0.013)

Age * Treatment -0.033*** -0.031*** -0.010 -0.003

(0.009) (0.009) (0.009) (0.010)

Avg Productivity * Treatment 0.004** 0.031** 0.007*** 0.050***

(0.002) (0.013) (0.002) (0.014)

Avg Productivity * Age 0.001*** 0.001***

(0.000) (0.000)

Avg Productivity * Age * Treatment -0.001** -0.001***

(0.000) (0.000)

Age at Tenure 0.025 0.027 0.029 0.187*** 0.194*** 0.201***

(0.019) (0.019) (0.019) (0.016) (0.015) (0.015)

Number of Observations 80131 80131 80131 1.493*** 1.233*** 1.549***

Number of Subjects 14972 14972 14972 (0.359) (0.432) (0.435)

Number of Switches 2435 2435 2435 80131 80131 80131

Log Likelihood -21444.006 -21434.451 -21430.395 14972 14972 14972

Chi-squared 686.767 689.587 710.192 2435 2435 2435

Notes: The table reports estimation results of Weibull (Panel A, and C) and Cox (Panel B) proportional hazard models of selection into performance pay. In panel A, “Above Median” is a dummy variable that is 1 for academics whose pre-reform average productivity is above the median average productivity of academics in the same field and broad tenure cohort (tenure before versus after 2005). In columns 1a-1c, the treatment variable is 1 for academics who have made tenure before 2005 and 0 for those who make tenure afterwards. Columns 2a-2c report the results for a placebo experiment where the placebo-treatment group comprises academics who start their first tenured position before 2002, while academics who start their first tenured affiliation between 2002 and 2005 act as placebo-control group. In Panel C, the Weibull model is estimated with strata for academic fields. “Avg Productivity” is calculated as three year pre-implementation averages (2002-2004) of the impact factor-rated number of publications. All other specifications as in Table 4.

82 Table A.7: Switching Analysis

Weibull Model Cox PH Model

1a 1b 2a 2b

Age -0.160*** -0.151*** -0.123*** -0.115***

(0.006) (0.008) (0.006) (0.008)

Avg Productivity 0.002** 0.000 0.002*** 0.001

(0.001) (0.001) (0.001) (0.001)

Post -0.022 0.628

(0.411) (0.440)

Post * Age -0.012 -0.015

(0.009) (0.010)

Post * Avg Productivity 0.003* 0.003*

(0.002) (0.001)

Constant 3.597*** 3.262***

(0.286) (0.378)

Number of Observations 65639 65639 65639 65639

Number of Subjects 7248 7248 7248 7248

Number of Switches 1599 1599 1599 1599

Log Likelihood -5122.780 -5089.336 -13575.028 -13572.396

Chi-squared 872.500 952.756 638.909 655.997

Notes: The table reports estimation results of Weibull (columns 1a-1b) and Cox (columns 2a-2b) proportional hazard models of affiliation, position or contract switches. “Post” is 1 as of 2005. All other specifications as in Table 4.

83 Table A.8: Instrumental Variables Estimation of Effort Effect

Panel A: Age and Age-squared IV # Publications IF-rated publications Citations Average IF-rating Average citations

Post’02 * Treatment 2.399*** 7.578*** 86.587*** -0.207 -23.968

(0.533) (2.373) (29.272) (0.440) (18.353)

Contract Change * Treatment 0.725 12.541** 65.253 0.559 -0.642

(1.061) (5.655) (49.292) (0.737) (17.236)

Number of Observations 57374 57374 57374 25721 25721

Number of Individuals 3197 3197 3197 2228 2228

Log Likelihood -157177.424 -249583.010 -392578.018 -54032.603 -139540.827

Test Statistics for Over-, Weak and Underidentification of Instruments and Endogeneity of Regressors

Kleibergen-Paap rk LM statistic 100.817 100.817 100.817 37.442 37.442

Chi-squared p-value 0 0 0 0 0

Kleibergen-Paap rk Wald F statistic 28.179 28.179 28.179 9.872 9.872

Stock-Yogo Critical values [5%; 10%] [5%; 10%] [5%; 10%] [10%; 20%] [10%; 20%]

Hansen J-statistic 13.799 20.988 17.462 0.204 1.991

Chi-squared p-value 0.001 0 0 0.9029 0.3695

Endogeneity Test 11.552 6.689 9.1 0.767 2.266

Chi-squared p-value 0.003 0.035 0.011 0.6813 0.3221

Panel B: Over-32 and Over-42 IV

Post’02 * Treatment 1.910*** 7.147** 50.690 0.381 -27.113

(0.608) (2.933) (34.866) (0.496) (16.718)

Contract Change * Treatment -0.260 2.704 -29.215 0.099 -2.356

(1.219) (6.427) (65.107) (0.955) (26.063)

Number of Observations 57374 57374 57374 25721 25721

Number of Individuals 3197 3197 3197 2228 2228

Log Likelihood -156924.517 -249105.190 -392319.928 -54031.927 -139576.869

Test Statistics for Over-, Weak and Underidentification of Instruments and Endogeneity of Regressors

Kleibergen-Paap rk LM statistic 65.318 65.318 65.318 20.401 20.401

Chi-squared p-value 0.000 0.000 0.000 0.000 0.000

Kleibergen-Paap rk Wald F statistic 17.241 17.241 17.241 5.559 5.559

Stock-Yogo Critical values [5%; 10%] [5%; 10%] [5%; 10%] [20%; >25%] [20%; >25%]

Hansen J-statistic 3.342 1.37 2.637 3.248 1.179

Chi-squared p-value 0.188 0.504 0.268 0.197 0.555

Endogeneity Test 4.699 2.012 0.903 0.575 2.989

Chi-squared p-value 0.095 0.366 0.637 0.75 0.224

84 Panel C: Panel Fixed Effects Model

Post’02 * Treatment 0.707*** 3.235*** 19.680** -0.005 0.948

(0.202) (1.052) (8.961) (0.078) (1.978)

Contract Change * Treatment 0.058 0.113 -3.702 -0.088 -0.070

(0.245) (1.199) (9.467) (0.079) (1.698)

Number of Observations 57374 57374 57374 25974 25974

Number of Individuals 3197 3197 3197 2481 2481

Log Likelihood -156813.461 -249006.509 -392300.527 -54405.337 -140674.500

Notes: The unit of observation is academic i. The sample is restricted to academics who started their first tenured affiliation at a German public university in 2002, 2003 and 2004 (excluding those with a foreign affiliation directly prior to this) and includes data from 1993 until and including 2012. Contract Change is 1 starting from the year in which an academic who already holds a tenured affiliation, changes their position, affiliation or contract (receives an outside offer) and every year thereafter. Treatment is 1 for academics who experience a contract change as of 2005 and 0 otherwise. Panels A and B report the results of 2-step GMM estimation of an instrumented baseline regression. In panel A, the Post002 ∗ Treatment and Contract Change ∗ Treatment interactions are instrumented for by Post002 ∗ Synthetic Age, Post002∗SyntheticAge−Squared, Post005∗SyntheticAge and Post005∗SyntheticAge−Squared variables. In Panel B, the Post002∗Treatment and Contract Change ∗ Treatment interactions are instrumented for by Post002 ∗ Over − 32, Post002 ∗ Over − 42, Post005 ∗ Over − 32 and Post005 ∗ Over − 42 variables. The instruments are based on synthetic age variables for which unknown ages are imputed using the average age at PhD, habilitation or tenure. The Over − 32 and Over − 42 variables are 1 whenever the synthetic age of the academic is larger than 32, respecively 42, and 0 otherwise. Panel C reports the estimation results of the equivalent panel fixed effects model (not instrumented). All specifications include 15 year-to-tenure dummies, year and individual fixed effects. Robust standard errors, clustered at the individual level are reported throughout.

85 Table A.9: Tenure Probability Analysis

Panel A: Productivity variables lagged by two years Panel B: Productivity variables based on impact factor-rated

number of publications weighted by number of authors

Number of publications IF-rated publications Number of publications IF-rated publications

1a 1b 2a 2b 1a 1b 2a 2b

Age -0.262*** -0.142*** -0.262*** -0.142*** -0.249*** -0.141*** -0.250*** -0.141***

(0.007) (0.006) (0.007) (0.006) (0.007) (0.006) (0.007) (0.006)

Productivity 0.014*** 0.015*** 0.003*** 0.004*** 0.014*** 0.016*** 0.003*** 0.004***

(0.001) (0.002) (0.000) (0.001) (0.001) (0.002) (0.000) (0.001)

Post’05 7.144*** 7.160*** 6.643*** 6.657***

(0.248) (0.248) (0.245) (0.244)

Post’05 * Age -0.172*** -0.172*** -0.160*** -0.160***

(0.006) (0.006) (0.006) (0.006)

Post’05 * Productivity -0.001 -0.001 -0.003 -0.001*

(0.002) (0.001) (0.002) (0.001)

Constant 4.917*** -0.193 4.933*** -0.19 4.401*** -0.230 4.416*** -0.227

(0.228) (0.238) -0.227 -0.238 (0.222) (0.236) -0.222 -0.236

Number of Observations 213514 213514 213514 213514 199583 199583 199583 199583

Number of Subjects 26847 26847 26847 26847 26453 26453 26453 26453

Number of Tenure Starts 12749 12749 12749 12749 12420 12420 12420 12420

Log Likelihood -21934.532 -21407.680 -21933.988 -21405.290 -21171.915 -20727.539 -21165.605 -20719.165

Chi-squared 3133.269 4340.346 3116.022 4329.72 2994.657 4016.870 3005.443 4042.777

Rho 2.618 2.644 2.619 2.646 2.611 2.638 2.613 2.640

Notes: The unit of observation is academic i. The table reports estimation results of Weibull proportional hazard models of transitions into first tenured positions. The event of interest here (the “failure” event) is the start of the first tenured affiliation, for academics who do not hold a foreign affiliation immediately preceding this change. The time from the completion of the habilitation until the first tenured position is used as duration variable and 1998 is the entry date. Whenever the actual habilitation year is unknown, it is imputed using the average age at completion of the habilitation and an academic’s (synthetic) age. Academics are “at risk” of obtaining a tenured position from this (synthetic) habilitation year onwards. The age variable is equal to an author’s self-reported age if known, and equal to a synthetic age otherwise. The synthetic age is calculated using the average age at habilitation, promotion or tenure. In columns 1a and 1b, the Productivity variable used is the number of publications; in columns 2a and 2b it is the impact factor-rated number of publications. In Panel A, these productivity variables are lagged by two years, while they are lagged by one year in Panel B. All models control for field fixed effects and are estimated for the years 1999-2013 (Panel A) or 1999-2012 (Panel B). Academics are dropped once they pass away or retire. Standard errors are robust and clustered by individual academic.

86 Table A.10: Stop Words

me it being against when will

my its have between where just

myself it’s has into why don’t

we itself had through how should

our they having during all should’ve

ours them do before any now

ourselves their does after both arent

you theirs did above each couldn’t

you’re themselves doing below few didn’t

you’ve what an to more doesn’t

you’ll which the from most hadn’t

you’d who and up other hasn’t

your whom but down some haven’t

yours this if in such isn’t

yourself that or out no mightn’t yourselves that’ll because on nor mustn’t

he these as off not needn’t

him those until over only shan’t

his am while under own shouldn’t

himself is of again same wasn’t

she are at further so weren’t

she’s was by then than won’t

her were for once too wouldn’t

hers be with here very

herself been about there can

87