Equality of Opportunity of Education in Germany. Evidence from a quasi-natural experiment.∗
Sebastian Camarero Garcia †
July 2017, [Corresponding Working Paper soon available]
Abstract
The goal of this paper is to shed light into how equality of opportunity in education (Equality of Educational Opportunity (EEOp) or respectively Inequality of Educational Opportunity (IEOp)) may be shaped by the recent trend to accelerate and intensify the educational process. For this propose, I analyze the impact of a controversial reform in Germany that shortened the duration of secondary school (Gymnasium) by one school year from 9 to 8 years while keeping the curriculum unchanged. Since, both the first student cohort in the new 8 year system and the last one taught for 9 years had to pass the same final university access diploma exams. The sharp, staggered introduction of this reform across the different German federal states can thus be exploited as a quasi-experimental setting, which allows estimating the reform induced increase in learning intensity on IEOp for students in a two step Difference-in-Difference estimation approach (DID). To measure this effect, I take the most recent available German-specific data from the Program for International Student Assessment (PISA) studies 2003, 2006, 2009, 2012 (PISA-I-2003-2012) providing comparable measures of cognitive skills in Reading, Mathematics and Sciences for students tested at the end of the 9th grade. Regression findings suggest that increased learning intensity induced by the Gymnasium-8-reform (G- 8-reform) did not improve EEOp. In the short-term, IEOp appears not to have changed. However, in the medium-term, a larger fraction in the variation of test scores can be explained by circumstances beyond the control of a 9th grade student. Thus, the analysis indicates that the reform induced increase in learning intensity aggravated IEOp though only after some time - because, for instance, favorable circumstances such as private tuition opportunities may have only materialized into test score differences after a period of adjustment. Moreover, results provide evidence for the existence of subject-dependent curricular flexibilities, with Maths/Sciences being more inflexible, thus more responsive to changing learning intensity than Reading. Thus, this paper is one of the first to provide based on a quasi-experimental setting causal estimators of how a factor, such as learning intensity, affects IEOp (hence also social mobility). JEL-Classification: D39, D63, I24, I29, O52 Keywords: Equality of Opportunity, Education & Inequality, Learning Intensity, German School System
∗I would like to thank my supervisor Andreas Peichl. Moreover, I would like to thank Felix Chopra, Cung Truong Hoang, Kilian Huber, Paul Hufe, Stephen Kastroyano, Panos Mavrokonstantis, Tim Obermeier, Federico Rossi, and David Schönholzer as well as the participants at the Public Economics and the CDSE Seminar at the ZEW/University of Mannheim for helpful suggestions and discussions. The usual disclaimer applies. †PhD-Candidate at CDSE - University of Mannheim; E-Mail: [email protected]
i List of Abbreviations
CTT common time trend.
DID Difference-in-Difference estimation approach.
EEOp Equality of Educational Opportunity. EOp Equality of Opportunity. ESCS PISA index of economic, social and cultural status.
FC Family Characteristics. FE fixed effect.
G-8-model Gymnasium-8-model. G-8-reform Gymnasium-8-reform. G-9-model Gymnasium-9-model. GDR German Democratic Republic (1949-1990) which consisted of the following today’s German federal states: Brandenburg(BB), East-Berlin(BE), Mecklenburg-Western Pomerania (MWP), Saxony (S), Saxony-Anhalt (ST), Thuringia (TH).
IC Individual Characteristics. IEOp Inequality of Educational Opportunity. IOL absolute measure of Inequality of Opportunity. IOp Inequality of Opportunity. IOR relative measure of Inequality of Opportunity (IOp). IQB Institut zur Qualitätsentwicklung im Bildungswesen (Institute for Educational Quality Improvement). ISCED International Standard Classification of Education. ISCO International Standard Classification of Occupation. ISEI International Socio-Economic Index of Occupational Status.
OECD Organization of Economic Co-operation and Development. OLS Ordinary Least Squares.
PC Parental Characteristics. PISA Program for International Student Assessment.
SC Standing Conference of the Ministers of Education and Cultural Affairs of the Länder in the Federal Republic of Germany. SES socio-economic status.
ii 1 Introduction
In an era of relatively high income and wealth inequality compared to the post-war decades in most Western countries (Piketty and Zucman, 2014), many people feel anger and discomfort about the economic system and their perspectives. Since, in democratic market-oriented societies, the general belief that by working and studying hard everyone may have a fair chance to climb up the social ladder has been central for maintaining cohesion and stability of the political and economic system. Therefore, today many analysts suggest that an increase in the number of both citizens who fear that their children may be worse off in the future (fear of downward mobility) and of groups in society who think that the “game is rigged” (fear of a lack of upward mobility) may be crucial for explaining rising political polarization within most Western countries (e.g. Brexit, Trump’s election). In other words, social mobility1 is getting an increasingly important issue within the broader context of aiming to understand drivers of and finding answers to recent trends of inequality within society. Thus, knowing more about the extent of Inequality of Opportunity (IOp) and in particular regarding education (IEOp)2 appears to be one central margin that influences social mobility (Chetty, Friedman, Saez, Turner, and Yagan, 2017); because education is said to be the main vehicle for upward mobility (Woessmann, Lergetporer, Kugler, and Werner, 2014). Focusing on Germany, two aspects illustrate these points. First, with the wealth distribution being quite unequal, as 10% of society own 60% of total wealth and the bottom 20% nothing or are indebted, social justice and thus EOp has become central in the political debate (The Economist, 2016). Second, as in other industrial countries, the level of social mobility observed in postwar-decades has declined since the 1980s. For instance, the Organization of Economic Co-operation and Development (OECD) has repeatedly shown that Germany belongs to the countries that are least upward-mobile with educational success being highly dependent on a student’s parental education background (compare OECD(2013b) and Figure A.1). Thus, the importance of the notion of EEOp becomes clear.
In this paper, I will adopt the canonical interpretation as illustrated by Ferreira and Peragine(2015) stating that society has achieved equality of opportunity if what individuals achieve with respect to some desirable objective, is fully determined by their choices and personal efforts, instead of by circumstances that are beyond an individual’s control. Circumstances are all the factors an individual cannot control but affect her outcome, while effort encompasses the choices an individual makes (e.g. how hard to study). According to the EOp normative criterion, inequality due to unequal circumstances is unfair because it is due to factors outside of individual control, whereas inequality due to unequal efforts is morally acceptable.
One way of leveling the playing field3 is to offer everyone access to (at least) secondary school education. However, to what extent do individuals have an equal opportunity to achieve good educational results ? In times of public spending constraints, accelerating growth of scientific knowledge and economic competition, among OECD countries, the debate on educational policies has become more output oriented. In fact, public attention has shifted on the key issue of how to make a country’s educational system more effective. Therefore in the area of economics of education, many aspects of both schooling quantity and quality have been investigated regarding their impact on cognitive skill formation and earnings.
1For an overview of the literature on social mobility, I refer to Fields and Ok(1999) 2IOp and Equality of Opportunity (EOp) refer to the same concept, putting just emphasis on either what may be considered to be the unfair part within the distribution of opportunities (IOp) or the fair part (EOp). Thus, if opportunities depend less on factors beyond the control of an individual and thus more on efforts, one may either state that EOp has increased or IOp decreased. In the following, I will try to use both terminologies in a manner to ease the interpretation of results. Finally, instead of EOp in education, I use similar as Brunori, Peragine, and Serlenga(2012) the expression EEOp and vice versa for IOp in education, IEOp. 3The expression of “leveling the field” within the context of EOp was first used by Roemer (Ferreira and Peragine, 2015).
1 To a much lesser extent, however, a factor combining schooling quality and quantity characteristics, learning intensity has been analyzed (e.g. Büttner and Thomsen(2015); Marcotte(2007); Pischke(2007)). With respect to schooling, learning intensity can be defined as the ratio of the amount of curricular content that is covered in a given amount of instructional time. From a social welfare perspective, it is interesting to reveal the effects of increasing learning intensity on both academic achievements and EEOp. For instance, understanding how more intense education influences the formation of skills may contribute to improving the efficiency of educational systems. Thus, Pareto-improvements may be achieved if learning intensity turned out to be an instrument resolving the trade-off between educational spending and output, i.e. academic merits.
Following the description of these two relevant and recent trends, the question concerning their interconnect- edness arises: How does variation in the intensity of education affect social mobility ? In this paper, however, I would like to draw the reader’s attention to the narrower question of how variation in learning intensity may affect IEOp, which is part of understanding the previous big picture question.
In order to approach an answer to this specific object of research, I will focus on a reform in Germany that changed learning intensity. During the last decade, the German federal states gradually shortened secondary school duration in Gymnasium at different points in time between 2001 and 2008 from nine to eight years (so called “G-8-reform”). While schooling duration was reduced, the curriculum was kept unchanged for the first affected (treated) cohorts, who thus experienced a considerable increase in learning intensity. Since, in each state the first cohort of students now only had 8 years of schooling in contrast to the 9 years of their predecessor cohort entering secondary school just one year earlier. However, as both groups were planned to take the same university access diploma exams in the same final year, treated students now had less time in total for homework or repetition requiring them to learn more material per school year (i.e. learning intensity increased). Therefore, the sharp staggered introduction of the reform across federal states satisfies the characteristics of a quasi-experimental setting. This allows taking into account common empirical deficiencies by applying a DID framework in order to estimate the reform induced increase in learning intensity on IEOp.
Thus, this paper aims finding first answers to questions such as: Is it possible to reduce the amount of schooling by increasing learning intensity without affecting EEOp ? Does the reform effect vary over time as students and their environment may adjust to the reform effect ? For this purpose, I use the German specific PISA data providing a representative sample of students in the 9th grade with standardized test scores in Reading, Mathematics and Sciences. Thereby, based on the associated rich set of background variables, I classify relevant circumstances similar to Ferreira and Gignoux(2014). Furthermore, surveys show that Germans remain split on the question whether Gymnasium should last 8 or 9 years (Woessmann, Lergetporer, Kugler, Oestreich, and Werner, 2015). In Eastern Germany, a majority supports shortened school duration, whereas the opposite is true across Western federal states that only recently adopted the new system. Thus, I contribute some evidence to evaluate the controversial reform. Finally, knowing how learning intensity may causally affect IEOp provides policy implications on how one would like to design curricula taking into account both the effects on cognitive skill formation and on EEOp. Furthermore, such evidence is needed to integrate the factor of intensity into the human capital literature.
The remainder of this paper is organized as follows. Section 2 introduces how IOp is measured and defined for the purpose of this study. Section 3 provides an overview of the related literature. Section 4 illustrates the relevant institutional background and the G-8-reform on which the quasi-experimental identification strategy is based on. Then in section 5, a discussion of the data used follows. Section 6 presents the empirical strategy whose results along with robustness checks are provided in section 7. Finally, section 8 concludes.
2 2 Measuring Inequality of Opportunity in Education (IEOp)
The idea that societies should distribute opportunities equally has a long tradition within political philosophy. Recently, it gained significant attention in the philosophical discourse after Rawls’ seminal contribution (Rawls, 1971) and the fruitful discussion which has ensued (Sen(1980), Dworkin(1981a, 1981b), Arneson (1989), Cohen(1989)). Most importantly, this debate established the idea that prerequisite for measuring EOp (or IOp) is distinguishing whether a form of inequality is morally acceptable or not within a society.4 Consequently, by the end of the 20th century, in the area of economic distributional analysis research started to shift attention from outcomes to opportunities. Since, defining equality of opportunity as objective (e.g. in a welfare analysis) allows taking into account that, on the one hand people tend to accept outcome differences due to individual responsibility (efforts) given identical circumstances (i.e. reward principle), but on the other hand they also tend to consider compensation for differences that can be attributed only to circumstances beyond an individual’s control to be fair (i.e. compensation principle). Regarding the compensation, one started to distinguish between an ex-ante (prior to the determination of the effort level) and an ex-post (after the determination of the effort level) principle.5 Moreover, Lefranc and Trannoy(2016) have illustrated how luck may be incorporated as intermediary category ("residual luck") between circumstances and efforts.
However, these ideas only started to capture broader attention of economists when in the 1990s scholars such as in particular Roemer (Roemer, 1998) started to translate these philosophical concepts into a more formal theoretical economic framework establishing a kind of canonical approach how to practically measure EOp. As an empirical literature on measuring EOp followed, with recent surveys on this topic offered by Ramos and Van de gaer(2015) and Roemer and Trannoy(2015) examining both direct and indirect measurement approaches, several estimation methods of EOp have been proposed. For instance, following the indirect, ex-ante approach many studies implement a parametric method proposed by Ferreira and Gignoux(2011) or Björklund, Jäntti, and Roemer(2012). But also other approaches including non-parametric estimation techniques (Checchi and Peragine, 2010), norm-based measures (Almås, Cappelen, Lind, Sørensen, and Tungodden, 2011) and stochastic dominance criteria (Lefranc, Pistolesi, and Trannoy, 2008) have been used.6 In the following, I try to define and explain the approach taken in this paper and how EEOp or IEOp should be understood in this context.7 To begin with, following the "canonical" model as formalized e.g. by Roemer (1998), laying down a set of definitions is useful to understand the concept of EOp in education, i.e. EEOp (compare e.g. Ferreira and Peragine(2015)):
• advantage: An advantage denotes an individual achievement (usually income, but in the context of this paper, it corresponds to educational outcomes as measured by PISA-test scores).
• efforts: The vector of efforts, E, denotes the set of variables that influence the outcome variable (advantage) and over which the student has some sort of control (e.g. choice of time for studying).
• circumstances: The vector of circumstances, C, denotes the set of individual characteristics which are beyond the student’s control, for which one cannot be held responsible, e.g. socio-economic status (SES) of the household you are born into, gender, ethnicity or innate ability/talents etc.
4Since, there is, for instance, strong experimental evidence, that people distinguish acceptable (fair) and unacceptable (unfair) income inequality (see e.g. Cappelen, Sørensen, and Tungodden(2010)). It appears to be acceptable if differences are due to individual responsibilities (efforts), but not acceptable if it is due to luck or randomness (circumstances). 5More details on the philosophical background and evolution of the EOp theory can be found in Ferreira and Peragine(2015). 6 A comprehensive overview is given by Ramos and Van de gaer(2015), chapter 3. 7For a broad and detailed overview on the main EOp measurement methods, I refer to chapter 4.10 in Roemer and Trannoy (2015), to chapters 3, 4 in Ferreira and Peragine(2015) as well as to the survey conducted by Ramos and Van de gaer(2015).
3 Given these definitions, we can formulate a conceptual framework illustrating how this paper aims to approach the measurement of EEOp (compare Ferreira and Gignoux(2011)& Ferreira and Gignoux(2014)). Consider a sample of N students indexed by i ∈ {1, ..., N}. Each student i can be described by a set
of attributes {y, Cn,Em}, where y denotes an advantage (here test scores), Cn is a vector of n discrete 8 circumstances and Em denotes the vector of m discrete efforts. Thus, in fact, we can represent the population
by a (n × m) matrix [Ynm] with a typical element ynm = g(Cn,Em)|C ∈ Ω,E ∈ Θ, g :Ω × Θ =⇒ R being the advantage that is function of both circumstances and efforts. After one has agreed on what variables
constitute the n different vectors Ci for each student i, which is always questionable (Gamboa and Waltenberg, 2012), one can thus split the sample into n distinct groups of students who share the same circumstances (i.e. they are of the same type). At the same time, the sample can be split into m distinct groups of students that invest the same level of efforts, but may have different circumstances (i.e. they belong to the same tranche). Together type and tranche form a cell, the typical element of the population matrix.
Given the framework, one basically distinguishes between direct and indirect measurement approaches.9 As the direct approaches aim to model explicitly the opportunity sets, their implementation has been difficult because opportunities are not directly observable. Instead, indirect approaches measuring EOp based on the observed joint distributions of outcomes and circumstances dominate the empirical literature (see section 3.1). Thereby, one distinguishes between an ex-ante and ex-post approach. This refers to how one evaluates EOp and thus to which normative welfare criterion is chosen (for an overview see Fleurbaey, Peragine, and Ramos (2015)). For instance, before effort is realized (ex-ante), following van de Gaer’s "mins of means" criterion, EOp is achieved equalizing mean outcomes across types, i.e. IOp is measured as between-type inequality satisfying ex-ante compensation and reward principle. Instead after effort is realized (ex-post), following Roemer’s "means of min" criterion, EOp is achieved eliminating inequality within tranches satisfying ex-post compensation. In the education context of this paper, thus one would consider that inequality in scores is still fair, i.e. differences are only due to efforts, if a student of given type obtained a higher score than another one of the same type. However, a similar degree of effort exerted by students facing different circumstances (i.e. of the same tranche) should give rise to similar outcomes – otherwise such inequality would be denoted unfair. Thus intuitively, the concept of EEOp, resp. IEOp can be translated as follows: Assuming talents to be distributed normally across the whole population, students working harder, i.e. putting in more efforts, should be rewarded by achieving good educational results regardless of their specific circumstances characteristics. Thus, unfair IEOp corresponds to differences in educational achievements between students who put in the same efforts but only differ in terms of their circumstances.10 In contrast, differences in educational achievements that can be attributed to individual efforts are acceptable (reward principle).11 Therefore, IEOp resembles differences between students that can be traced only to circumstances beyond their control.
In general, the practice of deriving a measure of IOp involves two steps, an Estimation Phase to transform
the original distribution [Ynm] into a smoothed one [Y˜nm] reflecting only the unfair inequality in [Ynm] and the Measurement Phase which thereon applies a measure of inequality.12
8Note that this economic model could be also extended to the case of having continuous elements in the vectors of circumstances and/or efforts (Ferreira and Peragine, 2015). However, in this paper circumstances will be discrete. 9For the less often used norm-based approach, compare Ramos and Van de gaer(2015). 10Furthermore, as Ferreira and Gignoux(2014) mention, there is also the argument that the allocation of scarce resources such as investments into a student’s education is only efficient if it is decided upon a student’s talent, and not on her circumstances. 11We can note that this is consistent with the notion of fairness in a so called responsibility sensitive egalitarian perspective (Brunori et al., 2012; Checchi and Peragine, 2010). Compare also with the fairness principles. 12For the first step, basically two estimation approaches could be taken. However, Fleurbaey and Peragine(2013) show that ex-post and ex-ante compensation are incompatible. But, if effort is distributed independently from circumstances, ex-post and ex-ante EOp are equivalent (see proposition II in Ramos and Van de gaer(2015)).
4 Following the literature, I conduct an ex-ante,"between-types inequality" measurement approach of IOp which is in line with the indirect approach (Ferreira and Peragine, 2015), because it is based only on the
observed marginal distribution of advantages (test scores) given by the vector y = {y1, . . . , yN } and on the 13 joint distribution of advantages and circumstances over the sample population {y, Cn}. In that regard, I follow the measurement approach of Ferreira and Gignoux(2014), because it requires to take less assumptions (e.g. on how to form tranches without direct observation of efforts). Moreover, given the high requirements for sample size and data availability, applying a non-parametric approach to conduct a within-tranche inequality decomposition (Checchi and Peragine, 2010) would be hardly feasible.14 For, the more precise one tries to design the partition, the smaller cells may become leading to bias in the measure of IEOp. Consequently, this paper adopts a parametric, ex-ante estimation approach to derive EEOp measures. Test scores, denoted by y, will be a function of circumstances and efforts (denoted by C and E respectively). Using this notation, I will model scores as y = f(C,E). Efforts, however, can also depend on circumstances, i.e. E = E(C) which implies y = f(C,E(C)), whereas vice versa efforts can not change circumstances. Thus, for instance, it should be noted that unobserved innate ability is taken into account in this framework and would be considered to be an unobserved circumstance factor, that may influence directly test scores through cognitive skills, but also indirectly via its impact on work ethic and other efforts characteristics. However, such efforts are cannot change other relevant circumstances, such as gender, parental education, etc. Moreover, using the PISA-data evaluating students in the 9th grade, the individuals involved are on average about 15 years old. Therefore, they may be regarded to (if at all) only partially accountable for their choices as argued by Gamboa and Waltenberg(2012) or Hufe, Peichl, Roemer, and Ungerer(2015). In summary, this model for measuring IEOp takes the role of circumstances, efforts and their interplay into account.
Following Ferreira and Gignoux(2014) a linear functional form is used and I will model the process as follows
0 0 yi = Ciβ + Eiγ + ei (1) 0 with Ei = Ciδ + ui (2)
Ci is a vector capturing circumstances variables and Ei is the unobserved vector of m efforts per student i. However, the aim being to estimate the full effect of circumstances on scores, i.e. both the direct and indirect effect on scores (via their impact on efforts), I will estimate the reduced form model:
0 0 yi = Ci(β + γδ) + (ei + uiγ) (3) 0 i.e. : yi = Ciρ + zi , where ρ = (β + γδ) and zi = (ei + γui) (4)
The residual, zi, will include both unobserved efforts and unobserved circumstances. With the aim at this point being to estimate the mean score outcome of each type conditional on circumstances, one proceeds with:
0 ybi = Ciρb (5) 13In general, this approach may include both elements from a parametric approach by relying on a linear model of advantages as functions of circumstances/efforts (Bourguignon, Ferreira, and Menéndez, 2007) and from a non-parametric approach, e.g. the between-types inequality decomposition according to Checchi and Peragine(2010). 14Since, this approach involves basically four steps. First, one defines the advantage variable. Second, one has to choose what variables to consider to form type and tranche, consequently the respective cells. Assuming that ideally the within-type distribution should be the same, thirdly, one can remove the within-cell score inequality to get a smoothed distribution of scores. Fourth, one then computes total inequality in the smoothed distribution in order to decompose it into a fair/unfair part .
5 This will create a new, simulated distribution of scores, yb = {yb1,..., ycN }, for each individual student. Thus, every i is assigned the value of her opportunity set (which in a linear regression corresponds to the expected score conditional on circumstances). This linear model could be estimated by an Ordinary Least Squares (OLS) regression which provides the vector of predicted test scores (i.e. the smoothed distribution).
Having assigned to each individual the value of their opportunity set, the second step, the Measurement Phase involves then calculating inequality in this new distribution, using a particular inequality index, I(.). To estimate IEOp, one would estimate the following ratio:
0 I(ybi) I(Ciρb) θbIEOp = = (6) I(yi) I(yi)
i.e. the ratio between inequality in circumstances (the simulated distribution) and total inequality (actual distribution of scores). Thus, instead of using an absolute measure of Inequality of Opportunity (IOL) (of IEOp), in this paper I use a relative measure of Inequality of Opportunity (IOp) (IOR) (of IEOp). Now, the remaining issue is what inequality index I(.) to use. The literature on EOp in income has used the Mean Log Deviation (MLD) index, due to its desirable properties (path independence in particular) (Ferreira and Gignoux, 2011). For the reasons outlined in Ferreira and Gignoux(2014), the MLD is not appropriate for measuring inequality in PISA test score data. Since, it is not ordinally invariant to the standardization of the PISA test scores15. Instead in this case, these authors show that the most appropriate measure for IEOp consists of the variance. Being an absolute measure of inequality itself, it is ordinally invariant in the test score standardization and it satisfies the most important axioms to be qualified as meaningful inequality measure, i.e. it satisfies (i) symmetry, (ii) continuity and (iii) the transfer principle (see section II in Ferreira and Gignoux(2014)). Thus on overall, the variance satisfies requirements for the proposed IEOp measure. Hence, inequality of opportunity in education (IOp) can simply be calculated as: variance(yb) θbIEOp = (7) variance(y) This measure is attractive for various reasons. Firstly, it is simply the R2 from an OLS regression of test scores on circumstance C variables (compare equation4). The only caveat is that this model will not estimate causal effects of individual circumstances. That is individual elements of ρˆ may be biased due to omitted variables bias and one should not interpret them as causal effect of certain circumstances on test scores. But secondly, as shown in Ferreira and Gignoux(2011), the R2 results in a meaningful summary statistic – the lower bound of the true IEOp. Since, if being interested in the total joint effect of all circumstances on educational outcomes as measured by test scores, the object of interest is to understand what percentage of the variation in scores y is causally explained by the overall effect of circumstances (directly and indirectly via efforts). With efforts being treated as generally unobserved, omitted circumstance variables, if we observed i them, this could only lead to a finer partitioning of [Ynm], which would further increase the IEOp measure 2 (for more details, I refer to Hufe and Peichl(2015)). Therefore, the R in equation (4), θbIEOp (in7), is a valid lower bound estimate of the joint effect of all circumstances on educational achievements. In other words, it is the lower bound of the share of overall inequality in educational achievement that can be explained by predetermined circumstances, thus constituting a lower-bound estimate of ex-ante IEOp.16
15Moreover, the authors show, that no meaningful inequality index can generate cardinally identical measures for pre-/post- standardization distributions of identical outcome variables, failing either scale or translation invariance. 16Niehues and Peichl(2014) outline how an upper-bound can be estimated in order to find boundaries for IOp estimates, though this method has not yet been widely applied because of data requirements (e.g. need of panel data).
6 Thirdly, θbIEOp is an IOR of IEOp that is cardinally invariant to the standardization of test scores (Ferreira and Gignoux, 2014). Moreover, one can decompose the IEOp measure into components for each individual variable in the circumstance vector, which is similar to conducting a Shapely-Shorrocks decomposition.
Finally as Ferreira and Gignoux(2014) note, θbIEOp can be regarded as isomorphic to measuring intergenera-
tional persistence of IEOp. For the latter following Galton, one usually conducts a regression of child’s (yit)
on parental outcomes (yi,t−1):
yit = βyi,t−1 + it, (8) with β as measure of persistence. If one used family background variables instead of parental outcome 2 variables for (yi,t−1), then the R measure of immobility (equation (8)) would be similar to θbIOP (equation (7)) as long as the circumstances vector contains mostly family background variables. In this regard, one may
interpret θbIEOp to be closely connected to measures of intergenerational educational immobility. To analyze the effect of increased learning intensity due to the so called “G-8-reform” on IEOp, this paper will apply a DID strategy using θbIEOp (7) as outcome variable. The respective empirical strategy will be outlined in section 6. Before, I will briefly mention the related literature.
3 Literature Review
The following section aims at providing first a brief overview on the empirical literature estimating Equality of Opportunity (EOp) (or (IOp)). In particular, it will be shown how this paper contributes to the still limited branch of the literature working on EOp with respect to educational outcomes (i.e. EEOp). Second, I try to illustrate in which way the quasi-natural experiment exploited in this paper has been studied so far and how this relates on a broader level within the area of economics of education to what we know about the impact of changing learning duration/intensity at school on outcomes such as educational achievements. Thus, the scope of what can be analyzed by relying on this reform should become more evident.
3.1 Equality of Opportunity (EOp) literature
So far in the literature on EOp, most studies focus on estimating inequality of opportunity (IOp) with respect to economic well-being as measured, for instance, by labor earnings, per household income or consumption [Ferreira and Gignoux(2011), Checchi and Peragine(2010), Björklund et al.(2012), Almås et al.(2011), Bourguignon et al.(2007), etc.]. In a survey of comparable cross-country studies, Ferreira and Peragine (2015) illustrate that estimates of the shares of overall inequality in income due to IOp vary significantly (from 2% in Norway to 34% in Guatemala).17 However, Roemer and Trannoy(2015) list in their survey about empirical work on measuring IOp of income common patterns that appear to be robust despite differences in the datasets and methods used in its estimation. Furthermore, as shown by Lefranc et al.(2008) the correlation of IOp and inequality of outcomes is high. Similarly, intergenerational income elasticity and Gini coefficient of incomes have been shown to be highly correlated (Great Gatsby Curve) indicating a link between IEOp and intergenerational social mobility (compare equation (8))(Brunori, Ferreira, and Peragine, 2013).
17This variation can be mainly attributed to the fact that parametric estimation procedures are a lower bound for the true magnitude of IOp (Ferreira and Gignoux, 2011) and are sensitive to the set of circumstances used. Moreover, assuming that choices made during childhood are beyond an individual’s control, i.e. before an age of consent (e.g. 16), Hufe et al.(2015) show that lower bound estimates of IOp in incomes may be even higher, e.g. up to 45% in the US.
7 Concerning the relationship between EOp and economic growth, Marrero and Rodríguez(2013) explain how to incorporate the notion of IOp into macroeconomic studies. For the US, they find evidence for a negative correlation of IOp and economic growth, but growth and inequality of efforts are positively correlated. Some other papers have examined health outcomes (see Fleurbaey and Schokkaert(2009), Rosa Dias(2014)). They propose that EOp with respect to health outcomes implies distinguishing between legitimate (fair) (e.g. due to lifestyle decisions such as smoking) and illegitimate (unfair) inequality in health (e.g. due to SES). Hufe and Peichl(2016) have extended the investigation of EOp to political participation suggesting that for the US the magnitude of IOp in political participation might be higher than IOp in incomes.18
So far, only a small literature has focused on measuring EOp for educational outcomes, i.e. EEOp. For this purpose, in particular to have comparability across countries and over time, the OECD PISA-test scores have recently become one educational outcome variable in this context. Most studies only focus on measuring EEOp for developing countries (e.g. Gamboa and Waltenberg(2012)).
The evidence for developed countries is still limited and often only part of cross-country comparisons. For instance, Ferreira and Gignoux(2014) use the 2006 PISA data to investigate EEOp across 57 countries. They find varying degrees of IEOp, both across countries and all three test-subjects within each country. For Germany, they find that about 35% of inequalities in test scores are unfair (36.8% for reading, 35.1% for maths, 35.2% for science). Furthermore, these authors show that IEOp is negatively correlated with spending for primary schooling, but positively with having a system of tracking students into secondary school. In that regard, Oppedisano and Turati(2015) focus on European countries and based on the PISA-2000 and 2006 data, they evaluate how IEOp changed between those years. However, they only use reading scores as outcome variable and calculate concentration indices to measure IEOp. For Germany, this index declined from 2000 to 2006, as it did for Spain, but not for France or Italy. Finally, conducting an Oaxaca decomposition, the authors suggest that between-school variance is more important in Germany than for other countries.19 The importance of family background variables on educational achievements is also shown by Carneiro(2008). Using Portuguese PISA-2000 data, the author finds that a student’s own as well as her peer-group’s parental education contribute most to the observed inequality in test scores, which reaches up to 40% (IEOp-measure). Moreover, the persistence of educational status seems to be a channel translating into inequality in wages.20 But similar to Oppedisano and Turati(2015), both authors admit that their framework cannot provide clear indication of underlying drivers, but emphasize that parental education is an important circumstance factor.21 Raitano and Vona(2016) also use PISA-data, for the year 2012, but taking into account jointly country-level, school-level policies and peer-effects, they analyze the relationship between the socioeconomic gradient or EEOp and the characteristics of various OECD-country’s educational system.22 They show that grouping students according to their abilities amplifies family background effects (FBEs), whereas putting together students with different SES in the same school reduces the influence of parental background on test scores.
18It appears to be prevalent regarding contacts to officials, monetary contribution in campaigns or membership in organizations. 19However, taking the full PISA-sample of 15-years old, they analyze German students from different school-tracks, which would explain partly observed betweeen-school variation. For an overview of the German school system is provided in section 4. 20Intergenerational persistence of educational status is closely linked to the concept of a socioeconomic gradient, i.e. the measurement of the likelihood of achieving certain educational outcomes given the parents’ educational background 21Nonetheless Oppedisano and Turati(2015) suggest that their results provide evidence that decentralized schooling systems (as in Germany or Spain) may be beneficial to reduce IEOp in contrast to more centralized systems (France or Italy). 22Raitano and Vona(2016) conduct 3 estimations, with PISA-test scores as outcome variable on individual control variables, country-level policies, the individual SES (or family background effect (FBE)) and first the interaction of country-level policies and FBE; second they include school-level policies and the interaction of them with FBE and finally they include the interaction of FBE and peer variables. They use the PISA index of economic, social and cultural status (ESCS) as FBE variable. They, for instance, confirm that postponing tracking age for students may reduce the socio-economic gradient, however, this effect diminishes when taking into account both school sorting policies and social environment.
8 In this paper, I also use the PISA-data to measure EEOp. But focusing on Germany and the academic secondary school track, I thereby exploit a quasi-experimental setting with the aim of deriving some evidence on how the reform policy changed IEOp. So far, only very few papers have exploited some kind of reform to better isolate the effect of educational policies on EEOp. For instance, Figueroa and Van de gaer(2015) evaluate a social insurance program in Mexico focusing not only on its effect on school enrollment, but also on its effect on inequality on this educational outcome variable. They provide a simple test how to evaluate whether a program or intervention improves IEOp or not. This means evaluating whether the expected outcome conditional on circumstances changes due to treatment as this allows classifying whether policies are equalizing or not.23 Instead, Bratti, Checchi, and de Blasio (2008) study an Italian policy of the 1990s that expanded higher education (HE) by offering educational institutions to open new sites and offer a broader range of degrees. They find that HE expansion had a significant, positive impact on university enrollment, but not on actually completing a degree. Therefore, the reform only slightly reduced IEOp as graduating from university remained dependent on family SES. In a related study on the Italian tertiary education system, Brunori et al.(2012) analyze the impact of a reform in 2001 that established 2-years master degree programs. In particular, they look at how the associated reduction in the length to get a first-level degree affected EEOp. Using different measurement approaches, they consistently find that IEOp improved between 1998 and the reform year 2001. However, it is not clear if the improvement in access to tertiary education resembles a lasting effect as after 2001 results are mixed. Conducting a similar analysis, I will though focus on a school reform. In that regard, Edmark, Frölich, and Wondratschek(2014) investigate if a 1992 Swedish school reform that considerably improved the possibilities to choose which school to attend24 may have had heterogeneous impact on students depending on their SES. Exploiting the reform setting and conducting a DID, the authors find no evidence for the existence of differential treatment effects. The overall estimated effects of the reform on students’ outcomes including long-term labor outcome variables are small. Thus, the Swedish school reform analysis suggests that EEOp was not affected. Analyzing also a school reform, my focus is on Germany. In that regard, Riphahn and Trübswetter(2013) try to study educational mobility for East- and West-Germany after reunification based on the German Mikrozensus (1991-2004). They provide evidence rejecting the hypothesis that educational mobility was initially higher in East- compared to West-Germany (as the socialist legacy may have originally suggested).25 More generally, Riphahn and Trübswetter(2013) reconfirm the importance of intergenerational persistence in educational achievements and that after reunification the secondary school system in Germany did not improve regarding EEOp.26 In this paper, first, I also try to add evidence about how EEOp changed over time in a developed country, Germany. Thereby, focus will be on secondary school, in particular on the academic track (Gymnasium). To my knowledge, this paper is among the first to provide an evaluation of EEOp combining the usage of comparable PISA-test score data with the virtues of a quasi-experimental setting to detect causal effects. Finally, as Ramos and Van de gaer(2015) point out in their conclusion, the knowledge on how institutions influence EEOp is still limited. This paper aims to contribute at understanding this issue by exploiting a reform that by reducing school duration increased learning intensity - to analyze its impact on IEOp.
23Their proposed method is best suited for the analysis of EEOp in the context of a randomized controlled trial (RCT) as often conducted in developing countries. Finally, decomposition methods are used to find which groups benefit most from a reform. 24Due to the reform students were allowed to attend a different public school than the one in their catchment area. Moreover, privately run, but publicly funded schools with a voucher system enabling students to attend them without fees were allowed. 25They show that female students were initially better off in the former German Democratic Republic (1949-1990) which consisted of the following today’s German federal states: Brandenburg(BB), East-Berlin(BE), Mecklenburg-Western Pomerania (MWP), Saxony (S), Saxony-Anhalt (ST), Thuringia (TH) (GDR), but that this relative advantage disappeared after reunification. 26The OECD(2016) confirms that only 19% of 25-34 years old achieve a higher educ. degree than their parents in Germany.
9 3.2 Related literature on varying learning intensity
Even though the "Gymnasium-8-reform" shows that educational politics consider changing the factor of educational intensity in school , only few studies have yet investigated the impact of such a reform. In first instance, empirical work has been concentrated on analyzing the effects of variations in schooling quantity without considerations of changing learning intensity. In that regard, mostly reforms increasing the amount of educational time have been considered. For instance, policies raising compulsory minimum school duration have been exploited to estimate the returns of additional schooling on earnings.27 Second, the impact of differences in instructional time on academic performance has been investigated. Relying on either cross-national or within-country differences in instructional time, such studies mostly suggest the impact of additional time on standardized test scores to be positive (e.g. Aksoy and Link(2000), Woessmann (2003), Lavy(2015)). Using the PISA-2006 data for 50 countries, Lavy(2015) shows that additional schooling time has a significant and positive influence on test scores in mathematics, sciences and reading - the effect being even stronger for students from lower SES which may be indicative for the equalizing characteristics of additional instructional time. Moreover, the fact that effects of schooling time are significantly lower for developing compared to developed countries suggests that the productivity of instructional time relies considerably on quality aspects of the school system and its environment.28
Only few studies have analyzed more explicitly the impact of variations in instructional time when curricular contents can be assumed to remain constant. In this context, reforms that have shortened schooling while keeping curricular content unchanged allow evaluating the impact of increasing learning intensity. Krashinsky(2014), for instance, exploits a reform in Canada that reduced the length of high school while keeping both curriculum and required standards for achieving high school diploma unchanged (i.e. learning intensity increased). Focusing on earnings as outcome variable, the study finds a temporary reduction in returns of schooling on earnings of about 10 percent for students affected by the reform. Nevertheless, the low long-term impact on wages suggests that increased learning intensity might not affect earnings negatively.29 The fact, that pre-reform students could choose to complete high school in four or five years, however, renders some doubts on whether the quasi-experimental set-up criteria are fulfilled in this study (Meyer, 1995). The results seem to be, though, in line with findings by Pischke(2007). This author exploits a reform in Germany that changed the start of a school year in all federal states to the autumn and that appears to fulfill the quasi-experimental setting criteria. While some states already followed the targeted school year cycle, many had to adapt by implementing two short school years between April 1966 and July 1967. Findings suggest that this reform significantly increased grade repetition and that entrance into the intermediate secondary school track fell by around 10 percent. Nonetheless, only small, negligible effects on earnings persisted. Therefore, Pischke(2007) predicts that based on the short school year experience the G-8-reform should not be associated with adverse effects on labor market outcomes for treated students. Marcotte(2007) estimates how variation in educational intensity affects test scores in mathematics, reading and sciences in the Maryland School Performance Assessment Program (MSPAP) by exploiting an unusual natural experiment.30
27E.g. Angrist and Krueger(1991) and Grenet(2013) find significant, positive effects of additional schooling on earnings as long as new compulsory minimum schooling laws affect students who may otherwise drop out of school without degree. 28However, the study cannot distinguish between pure time and the knowledge effect due to more content taught in more time. 29Whether this might be true due to the fact that schooling works primarily as signal or whether increased educational intensity might compensate human capital accumulation in response to reduced schooling quantity, remains an open question. 30With snowfall varying approximately randomly across Maryland each year, snow-related school closure creates random variation in the number of available school days for students from the same grade in each specific school year. Thus, a quasi-experimental variability in time available to prepare for the MSPAP test is created among students.
10 He finds positive, significant effects of additional school days on performance being largest for mathematics. Differences in effects across subjects are interpreted to be consistent with subject-related curricular flexibility. For instance, assuming mathematics to have a quite inflexible curriculum, a reduced number of school days could be less easily compensated by increasing learning intensity in mathematics compared to other subjects.31
In this paper, I exploit the G-8-reform that allows me exploiting a quasi-experimental setting to study the effect of increased learning intensity on EEOp (see section 4). Despite the public controversy about this reform that has even partially induced federal states to reverse it (see Table A.2), only few scientific studies have evaluated the G-8-reform and its effects on outcomes such as educational achievements. First, there have been a few studies aiming to analyze the reform by comparing Gymnasium-8-model (G-8- model)-cohorts and Gymnasium-9-model (G-9-model)-cohorts within one federal state. To begin with, in most federal states the respective statistical offices have conducted studies comparing grades in central exit examinations (Abitur) of students in the double cohort, that is the respective year when both the last G-9-model- and the first G-8-model-cohorts graduated from Gymnasium (compare Table A.2). Generally, these statistical evaluations have found no systematic performance differences in central exit exams between students with 8 or 9 years of secondary school duration.32 Furthermore, for the federal state of Saxony-Anhalt (ST), a small series of papers (Büttner and Thomsen, 2015; Thiel, Thomsen, and Büttner, 2014; Meyer and Thomsen, 2016) has analyzed aspects of the G-8-reform. In summary, they analyze the reform’s effects on academic achievement using as outcome variable results in central exit examinations of 2007, when the double cohort graduated in Saxony-Anhalt (ST). Findings suggest that - due to more intense schooling - exam achievements in mathematics deteriorated significantly, but remained unaffected for German literature showing that learning intensity ratios differ across subjects. Moreover, no significant, negative effects on student’s non-cognitive, soft skills are detected, opposing claims that increased learning intensity and associated reduced time for non-schooling activities may have adversely affected non-cognitive skill formation. In line with this result, Milde-Busch, Blaschek, Borggräfe, von Kries, Straube, and Heinen(2010) reject the hypothesis that from a medical point of view the more intense schooling experience had significant impacts on stress levels of students However, due to reduced leisure time, G-8-model-students were less able to relax relative to their peers in the G-9-model. Finally, Meyer and Thomsen(2016) find no negative effects of the G-8-reform on the ability, motivation and likelihood to conduct university education.33
Recently, very few papers have started to use more representative data that might be more independent from school system related characteristics or relative performance measurement issues arising with marks at school (e.g. PISA-data). Moreover, identifying the G-8-reform effect by exploiting the variation in its implementation across federal states and over time, this approach allows targeting the shortcomings of previous studies, such as federal state specific trends, by applying methods taking into account variation across states (e.g. DID). For instance, the two most comparable studies to this paper not only rely on exploiting a setting with multiple
31 There have been a few similar studies exploiting quasi-random assignments of instructional time (e.g. due to timing of school year, absence periods of teachers) that usually find similar positive effects of additional time on test scores. Marcotte(2007) is an illustrative example of a study exploiting quasi-experimental variations in instruction time while keeping the curricular requirements unchanged, as a method to analyze learning time/intensity effects. 32For instance, there are federal states with no observed difference (Saarland (SL), North-Rhine-Westphalia (NRW)), in some states the G-9-model-students remained slightly better (Baden-Wuerttemberg (BW)), but in some the opposite has been the case (Hesse (HE), Berlin (BE)) and finally in some results differ between the two groups depending on the subject (Bavaria (BV)). 33But the reform had some influence on post-secondary school decisions. Since, for instance, they find significant delays in the starting dates for a first university degree for female students who graduated from a G-8-model school, because they now more likely first complete a type of vocational education. Moreover, in the questionnaire, students reveal that despite the G-8-reform, they still continue with their hobbies. However, students also state to work less outside school but getting more extra tuition.
11 federal states and several time periods, but also use the same outcome variables for educational achievement, the standardized PISA-test scores in reading, mathematics and sciences for academic-track ninth-graders.34 First, Andrietti(2015) uses a data set comprising PISA-2000 to -2009 representative of the 16 German federal states in order to exploit the G-8-reform for conducting a DID-estimation of the effects of increased learning intensity on test scores.35 He finds that the average treatment effect of the reform is significant and positive in all three educational outcomes.36 Finally, Andrietti(2015) finds no evidence for a significant increase in general grade retention rates in contrast to Huebener and Marcus(2015). Instead his results suggest, that only for boys and students with migration background grade repetition may have increased. This may indicate that the G-8-reform caused distributional changes in educational outcomes and thus may have affected EEOp, however, Andrietti(2015) does not really address this issue. In summary, the author shows that students might benefit from increased instructional time despite higher learning intensity.37 Huebener, Kuger, and Marcus(2016) in addition to Andrietti(2015) includes the PISA-2012-data which allow more federal states to be included for the analysis. First, they show based on state regulations of timetables for secondary school, that due to the G-8-reform weekly instruction hours for the average treated student increased by about 6.5 percent over a period of almost 5 years. Then, the authors suggest that this increased instruction time improved student performance on average in all three PISA testing domains. However, the effect size is small, with about 6 percent of a standard deviation in scores. Moreover, for low-performing students positive effects are insignificant, whereas their high-performing peers experience significant, positive effects indicative for a widening of the performance gap among students in Gymnasium. In that regard, Huebener et al.(2016) try to focus on the increased instruction time effect, whereas (Andrietti, 2015) puts more emphasis on the increased learning intensity aspect of the reform. In this paper, I use similar data as Huebener et al.(2016) with PISA-test scores from 2000 to 2012. However, my focus regarding the reform effect follows Andrietti(2015) with emphasis on the effects of increased learning intensity. While these studies shed light on the direct effect of the reform on test scores, they do not try to tackle the question if and how the reform by increasing learning intensity may have changed EEOp.
In this paper, first, I try to shift the emphasis in the analysis of the G-8-reform on distributional concerns, i.e. its consequences on IEOp. In other words, this paper aims to answer the question whether the G-8-reform may be considered to be a selective, i.e. at least maintaining test score results and at the same time increasing IEOp or an inclusive reform, i.e. that at least maintains test score results while increasing EEOp (Checchi and Van De Werfhorst, 2014). In that regard, to my knowledge, this paper may be among the first evaluating the G-8-reform based on the German specific PISA-test scores in order to analyze its impact on EEOp. Thus, the second aim of this paper is to contribute to the literature on EEOp (compare section 3.1) by providing some evidence on a potential policy channel, learning intensity, at the school level. Finally, this paper aims to shift attention on the largely neglected factor of learning intensity having implications for both the effectiveness and efficiency of (non)cognitive skill formation.
34In 2012, in my Bachelor Thesis: "Does shortening secondary school duration affect student achievement and educational equality? Evidence from a natural experiment in Germany: the ’G-8 reform’" I, Sebastian Camarero Garcia, already combined the usage of PISA-test scores in reading as outcome variable to analyze in a DID-estimation framework the effects of the G-8-reform on cognitive skills, finding a positive effect of about 0.15 standard deviations in test scores, with stronger effects for students with migration background (Camarero Garcia, 2012). 35As explained in section 5, to have more consistency and comparability across the studies used, I rely on PISA-I-samples for all years and do not mix PISA-E and PISA-I samples. 36Treated students being in a G-8-model experience an improvement of about 0.095-0.145 standard deviations in PISA-test scores.Furthermore, the author tries to estimate the effects of the approximate pure instruction time increase on test scores and finds similar results: a twenty-hour increase distributed over grades 5-9 or a ten-hour increase distributed over grades 8-9, correspond on average to an improvement of 0.08-0.15 standard deviations, respectively, depending on the subject. 37This is in line with studies finding a positive impact of additional instructional hours when learning intensity is kept constant.
12 4 The “G-8-reform”: shortening secondary school duration
The goal of this section is to explain the institutional background and implementation of the G-8-reform in order to illustrate how this reform can be exploited to set up a quasi-experimental estimation approach. This allows analyzing the effect of increased learning intensity on a measure of IEOp as described in section 2.
4.1 Institutional background: The German school system
Following its federalist organization, educational policy is run by each federal state (Bundesland or Land). While secondary school systems can differ across German federal states, most features are comparable.
School starts usually at the age of six, when students enter primary school for a period of four years. Only in Berlin (BE) and Brandenburg (BB) primary school starting also at the age of six comprises grades 1 to 6. After primary school, students enter a tripartite secondary school system where the choice of track is determined by their previous academic performance.38 Both the shortest track of secondary school, Hauptschule, and the intermediary track, Realschule, allow graduates to pursue apprenticeship programs after a total of 9 or 10 years of schooling. University access is restricted to those gaining the Abitur, the university access diploma, by completing the academic track, Gymnasium, which before the G-8-reform lasted for 9 years. Thus including primary school, students achieving the university admittance qualification used to graduate after 13 years. Nevertheless, some federal states provide already for several decades the Abitur after 8 years of secondary schooling (12 years in total). In fact, federal states that were part of the former GDR had originally developed a different secondary school system compared to federal states in West-Germany. All students were taught together until the 10th grade, after which they could either follow vocational training or reach the Abitur after completing two additional years of Gymnasium. Even though in the process of German reunification, most federal states in East-Germany adjusted to the West-German standard of 13 years of schooling to achieve the university access diploma, i.e. a Gymnasium-9-model, Saxony (S) and Thuringia (TH) decided to maintain the Gymnasium-8-model. Nonetheless, coordination through the Standing Conference of the Ministers of Education and Cultural Affairs of the Länder in the Federal Republic of Germany (SC), initiated a framework of ensuring comparable nationwide academic standards.39 For a more detailed overview of the German education system, I refer the interested reader to Figure A.2 in the Appendix A.4 (see also Standing Conference of the Ministers of Education(2015)). 40
38This may be regarded as simplified illustration of the track selection process, as in fact, primary schools issue recommendation for each student during grade 4 (or 6) which track would be suitable for the respective student based mainly on the student’s performance and progress during primary school. These recommendations used to be binding in many federal states during the time period considered in this study. For more information on the tracking system, compare Dustmann, Puhani, and Schönberg(2014). An overview of recent regulations of the individual federal states with respect to the transition from primary to lower secondary education is available on the website of the Standing Conference http://www.kmk.org/fileadmin/ Dateien/veroeffentlichungen_beschluesse/2015/2015_02_19-Uebergang_Grundschule-SI-Orientierungsstufe.pdf and for the period considered in this paper https://www.kmk.org/fileadmin/Dateien/veroeffentlichungen_beschluesse/2006/2006 _03_01-Uebergang-Grundschule-Sek1.pdf . 39It is the conference consisting of the Secretaries of Education and Cultural Affairs of all 16 federal states that has, for example, adopted the Uniform Examination Standards for the Abitur examination in October 2007. 40In addition to Hauptschule, Realschule and Gymnasium, recently several federal states have started to provide a type of comprehensive schooling (Integrierte Gesamtschule or Schularten mit mehreren Bildungsgängen). In these comprehensive schools, students are not tracked into specific academic paths after primary school, but can graduate after 9, 10 or 13 years of schooling. However, the vast majority of students achieving the Abitur, still attends the Gymnasium for this purpose.
13 4.2 The “G-8-reform” and its implementation
The first PISA-study in 2000 received broad public attention in Germany, because it revealed that German students achieved within the OECD only below average test scores in the three basic competences reading, mathematics and sciences (so called “PISA-shock”). Debates about improving the German school system ensued (cf. Ertl(2006), Anderson, Fruehauf, Pittau, and Zelli(2015), Ammermueller(2007)). Among the reform proposals, shortening the academic track in secondary school (Gymnasium) from 9 to 8 years, the “G-8-reform”, remains controversial to this day.41 Mainly three reasons were given to justify its introduction. First, it was aimed to reduce the relative high age of university graduates in Germany. On the one hand, this was said to increase their competitiveness on the labor market compared to the (on average) younger graduates in other OECD countries (cf. OECD(2005a)). On the other hand, with students entering the job market one year earlier, working lifetime would be extended, such that the reform was said to contribute to stabilizing the social security system of a society facing demographic change.42 Second, as the most successful countries in the PISA-test ranking, e.g. Finland, had a common school duration system of 12 years, reduced schooling appeared to be both successful and efficient. Thirdly, the “G-8-reform” was regarded to be a necessary adjustment of secondary school with regards to aiming at harmonizing tertiary education across Europe. Since, as Büttner and Thomsen(2015) illustrate, the reform of shortening secondary school duration was also enacted with respect to the Bologna Process. This European initiative aims at creating a European Higher Education Area (EHEA) providing a more comparable, flexible European framework for tertiary education. For this purpose, adjusting secondary school duration towards the average of European counterparts appeared to be sensible. Moreover, it was regarded to become an incentive for then younger school graduates to strive for achieving a university degree, thereby increasing the below OECD-countries-average rate of university graduates per birth cohort in Germany.
Opponents, however, have argued that the reform induced intensification of education might even tighten educational opportunities by aggravating the already above-average of OECD countries existing high correlation of educational achievements and socio-economic family background, as indicated by the so called socio- economic gradient.43 If the role of background factors in skill formation gained importance, e.g. parental support as a resource to deal with intensified tuition at school, such fears might be reasonable. Furthermore, parental complaints about increased stress for students due to reduced free time resources emphasize the fear of negative impacts on both academic performance and the development of non-cognitive skills typically formed by non-academic free time activities (Thiel et al., 2014).
Beginning in 2001, all 14 federal states with a Gymnasium-9-model gradually decided to shorten secondary school duration from 9 to 8 years. With the respective graduation of a double cohort consisting of both the first G-8-model and the last G-9-model student cohort that had to pass the same final exams (the Abitur exam), the reform process in each federal state took 8 years to transform all grades of Gymnasium (Huebener and Marcus, 2015; Standing Conference of the Ministers of Education, 2016c). As illustrated in Table A.2, the different federal states implemented the G-8-reform one after the other between school years 2001/2002 and 2008/2009, with the associated double cohorts graduating between
41I refer to the last column in Table A.2 for an overview on the status quo of the reform as of school year 2015/16. 42Since, younger university graduates would pay earlier and over a longer timespan contributions that stabilize social security. 43In this context, the socio-economic gradient refers to the correlation of parents’ and their child’s educational achievements. It is used as a descriptive measure indicating the extent to which educational outcomes can be "inherited" (Carneiro, 2008). International studies show that the German socio-economic gradient is relatively high (e.g. Prenzel, Artelt, Baumert, Blum, Hammann, Klieme, and Pekrun(2006); OECD(2005a)).
14 2006/2007 and 2015/2016 (see also Figure A.3 and Figure A.6 in Appendix A.4).
In general, theSC of education ministers decided that standards for the university access diploma ( Abitur) were not to be lowered in response to the reform. Instead, the minimum amount of teaching required before a student can graduate from Gymnasium was maintained at the level of having to pass at least 265 weekly lessons during secondary school (Standing Conference of the Ministers of Education, 2016c). This should guarantee a comparable standard of quality of the Abitur across all 16 federal states - independent of schooling duration. Consequently, curricular contents originally taught in seven years (from 5th to 11th grade) were now distributed across the remaining six years (from 5th to 10th grade), such that students in the G-9-model were supposed to enter the Gymnasiale Oberstufe, the final two school years of Gymnasium, as if they had completed the original 11th grade. This followed the reasoning that these final two years of Gymnasium (called also qualification phase)44, are aimed at preparing students for the Abitur with a relative comprehensive curriculum, such that adding more curricular contents into these years was said to be limited. Furthermore, the first G-8-model-cohort in each federal state was planned to enter a common qualification phase together with the last G-9-model-cohort. For this reason, the last two years of Gymnasium were kept unchanged and the “lost” year had to be compensated already during grades (5-10)[7-10 in BE or BB]. Thus, it is plausible to assume that curricular content was not reduced for the first student cohorts affected by the “G-8-reform” in any of the federal states.45
The fact that educational ministers only started to effectively consider revising curricular contents in the G-8-model compared to the previous G-9-model around 2010 does not influence the validity of the assumption taken, as it would only impact the reform effect for later “G-8-student-cohorts” (after 2012).46 Since, in order to maintain the required number of total minimum weekly periods unchanged for the new G-8-model, instructional time increased by 2-4 weekly periods per grade (during grades 5-9) for affected students compared to previous cohorts in the G-9-model-Gymnasium (Standing Conference of the Ministers of Education, 2013). This is also shown by Huebener et al.(2016) who have collected specific binding timetable regulations of each federal state and illustrate the change in the distribution of average weekly instruction hours (over grade 5-9) in Figure A.7. However, the total loss in time of one school year was not fully compensated by additional instructional time. As to limit the amount of afternoon schooling at the 5th/6th grade, to some extent, e.g. hours originally planned for revision were dropped.47 In that sense, as for the first cohorts curricular content was not reduced, the amount of material that had to be covered per week per grade increased. Therefore, teaching had to be more compact, i.e. it had to convey more contents in the given amount of a one year reduced secondary school duration. In conclusion, it is plausible to assume that the “G-8-reform” exogenously increased learning intensity defined as the amount of curricular content covered in a given period.
44Only marks during final two years together with marks in the Abitur exam form the total mark fixed in the Abitur diploma. 45This could be interpreted as evidence that students in the G-8-model had in principle no loss in overall taught knowledge when compared to students in the G-9-model, because they were obliged to pass the same final two years including final Abitur exams and thus had to learn in advance the same material during 6 instead of originally 7 school years. 46For this reason, for the scope of this analysis including only data up till 2012, changes in the implementation of the G-8-reform as indicated in the last column of Table A.2 do not affect the student cohorts analyzed in this paper (see also Section 5). 47Andrietti(2015) offers the following broad calculation based on the regulations set by theSC and grade-level state-specific data on weekly hours. However, one should note that this is only an approximation for an average student, as the exact hourly impact depends on the federal state and school a G-8-model-student attended. Nevertheless, I cite footnote 2 in Andrietti(2015) as it nicely illustrates the approximate change caused due to the G-8-reform for affected 5th graders and I also refer to figure A.8: By the end of grade 9, G8 students have covered the curriculum corresponding to 6,460 (265/8 per week over 39 weeks for five grades) of the 10,335 instructional hours required for graduation. This means that they have accumulated on average 720 more instructional hours and only 430 less hours than G9 students at the end of grade 9 (265/9 per week over 39 weeks for five grades, i.e., 5,740 hours) and grade 10 (6,890 hours), respectively.
15 4.3 Identification strategy: The quasi-experimental set up of the reform
The G-8-reform and its implementation at different points in time at the federal state level can be exploited as a quasi-experimental setting to derive the reform effect on a measure of IEOp (see section 2). This requires categorizing the 16 federal states into Treatment (T) and Control (C) groups for each PISA-test year. Table 1 illustrates how Treatment (T) and Control (C)-Groups may be formed based on the reform imple- mentation process across federal states and time.48 After describing the data used in this paper (section 5), in section 6 I will use the following definitions to explain the specific empirical strategy employed in my analysis. To begin with, I will define 4 models based on the time period included in the analysis: - Baseline-Model: medium-term perspective (Base-MT): covers the time/testing period from 2003 until 2012 - Extended-Model: medium-term perspective (Full-MT): covers the time/testing period from 2000 until 2012 - Baseline-Model: short-term perspective (Base-ST): covers the time/testing period from 2003 until 2009 - Extended-Model: short-term perspective (Full-ST): covers the time/testing period from 2000 until 2009
For the medium-term models, the following T/C-Groups with reform-time set between 2006 and 2009 exist.
• Neither T nor C: Five federal states can not be classified into either T or C-Group: Saarland (SL), North Rhine-Westphalia(NRW), Hesse(H), Saxony-Anhalt(ST), Mecklenburg-West-Pomerania(MWP). First, Saarland (SL), the first Western state that implemented the reform, has to be excluded as 9th-graders were already taught in a G-8-model-Gymnasium both in 2006 and 2009. As it changed school duration earlier (2001/2002) than most other states, it can neither be regarded as clean T nor C-Group in the Base-MT or Full-MT setting with the general reform time set between 2006 and 2009. Similarly, in the medium-term perspective (including years 2009 and 2012), North Rhine-Westphalia (NRW) has to be excluded, as 9th-graders were still taught in a G-9-model both in 2006 and 2009, such that the reform affects tested students only from 2012 onwards. Thus, NRW is also neither a clean C nor a clean T-Group state in Base-MT model. For the same reasoning, Hesse (H) must be excluded.49 Furthermore, the two Eastern states of Saxony-Anhalt (ST) and Mecklenburg-West-Pomerania (MWP) differ considerably in the way they implemented the reform from other federal states.50 Being already T-Group in 2006, they cannot be T-Group for a DID with the reform time set between 2006 and 2009.
• According to Table 1 seven federal states can be classified as the T-Group, because tested 9th graders were only in a G-8-model from 2009 onwards: Treatment-Group-(T7): Baden-Wuerttemberg (BW), Bavaria (BV), Lower-Saxony (LS), Bremen (BR), Hamburg (HB), Berlin (BE), Brandenburg (BB). However, as Eastern federal states were part of the former GDR, they are likely to be still different from Western states, for instance with regards to teachers that were still educated in the GDR. Thus, focusing only on Western states, one gets Treatment-Group (T5): BW, BV, LS, BR, HB . Finally, the most conservative setting would be formed by the Treatment-Group (T3): BW, BV, LS. It consists only of Western, territorial federal states, because such states consisting of heterogeneously populated larger areas may probably exhibit some inherently different characteristics from city states.
48Note that in the main specifications, the reference point for the G-8-reform is set to be between 2006 and 2009, since for 7 out of 13 reforming federal states, 9th graders participating at the PISA 2009 test were the first cohorts affected by the reform. Thus, it is most convenient to set reform time between test years 2006 and 2009 in order to conduct a DID estimation approach. 49Despite being the only Land that did not implement the reform uniformly for Gymnasium at the start of one school-year, but successively over three years, one could still classify it to be C-Group in 2009 when only 10% of students tested had been already treated. But then the reform only occurred by 2012 and it becomes C-Group both in 2006 and 2009, when 9th-graders were still taught in a G-9-model-Gymnasium. Thus, it should be excluded for being neither T- nor C-Group in the Base/Full-MT model. 50They applied treatment only from the 9th grade onwards (not as most other states from the 5th grade onwards), such that its the reform impact had to be relatively stronger compared to Western states.
16 Table 1: "G-8-reform" Treatment/Control-Group allocation of PISA cohorts per state
PISA cohorts affected (if) Treatment cohort/grade affected reform double Treated Federal state enaction cohort grade 2000 2003 2006 2009 2012 2006 2009 2012
2004/2005 2010/2011 6 but first 6th cohort treated not in 9th grade in a PISA test year Bavaria (BV) - 1st cohort 4th cohort 2004/2005 2011/2012 5 CCCTT 5th graders 5th graders 5th graders 2004/2005 2010/2012 6 but first 6th cohort treated not in 9th grade in a PISA test year Lower-Saxony (LS) - 1st cohort 4th cohort 2004/2005 2011/2013 5 CCCTT 5th graders 5th graders 5th graders Baden- - 1st cohort 4th cohort Wuerttemberg 2004/2005 2011/2012 5 CCCTT (BW) 5th graders 5th graders 5th graders
Rhineland- --- Palatinate 2008/2009 2015/2016 5 CCCCC (RP) 5th graders 5th graders 5th graders
Schleswig- --- Holstein 2008/2009 2015/2016 5 CCCCC (SH) 5th graders 5th graders 5th graders
North Rhine- - - 3rd cohort Westphalia 2005/2006 2012/2013 5 CCCCT (NRW) 5th graders 5th graders 5th graders
- 3rd cohort 6th cohort Hamburg (HB) 2002/2003 2009/2010 5 CCCTT 5th graders 5th graders 5th graders
- 1st cohort 4th cohort Bremen (BR) 2004/2005 2011/2012 5 CCCTT 5th graders 5th graders 5th graders
- 1st cohort 4th cohort Berlin (BE) 2006/2007 2011/2012 7 CCCTT 7th graders 7th graders 7th graders
- 1st cohort 4th cohort Brandenburg 2006/2007 2011/2012 7 CCCTT (BB) 7th graders 7th graders 7th graders
2006/2007 9 2007/2008 8 1st cohort Saxony-Anhalt 2003/2004 2008/2009 7 CCTTT 7th graders (ST) 2009/2010 6 2nd cohort 5th cohort 2010/2011 5 CCCTT 5th graders 5th graders 2007/2008 9 Mecklenburg- 2008/2009 8 CCTTT 1st cohort Western 2004/2005 2009/2010 7 8th graders Pomerania 2010/2011 6 1st cohort 4th cohort (MWP) 2011/2012 5 CCCTT 5th graders 5th graders
Saxony (SN) since 1949 5 C2 C2 C2 C2 C2 hypothetical control group: always in treatment
Thuringia (TH) since 1949 5 C2 C2 C2 C2 C2 hypothetical control group: always in treatment
1st cohort 4th cohort 7th cohort Saarland (SL) 2001/2002 2009/2010 5 CCTTT 5th graders 5th graders 5th graders
2004/05 2011/2012 5 CCC (T) T - (less than 10%) 2nd/3rd/4th coh. 2005/06 2012/2013 5 CCC C T - 1st cohort 2nd/3rd/4th c. Hesse (H)a 2006/07 2013/2014 5 CCC C T - - 2nd/3rd/4th c. 5 5th graders 5th graders 5th graders a Hesse(H) only introduced the reform gradually across 3 school years (compare Table A.2 and Figure A.6 as well as for reg. settings Figure A.4/ Figure A.5). Notes: In this Table, Treatment/Control-Groups are highlighted by rectangular boxes in the following way: For the Base/Ext-ST/MT Models: Treatment T3 ≡ red rectangle, T5 ≡ red + magenta rectangle and T7 ≡ red + magenta + violet rectangle. For the Base/Ext-MT Models: Control-Group (C2) ≡ blue rectangle; for the Base/Ext-ST Models: Control-Group (C3) ≡ blue + green rectangle. Adding H to C3 would form Control-Group (C4). Finally, TH and S form a hypothetical Control-Group (C2hyp) that always remain in a Gymnasium-8-model. 17 For the medium-term models, two Control-Groups may be formed by the remaining four federal states.
• Control-Group (C2): Rhineland-Palatinate (RP), Schleswig-Holstein (SH) In two territorial, Western federal states student cohorts attended a G-9-model-Gymnasium during the whole time frame. They would best match with the T3-Group. hypothetical Control-Group (C2hyp): Saxony(S), Thuringia(TH) These two Eastern federal states have already followed a G-8-model since 1949, when the GDR was founded and maintained their secondary school system beyond reunification. As they always stayed in a G-8-model, they cannot contribute in estimating the causal reform effect of shortening secondary school duration. However, they can form a hypothetical Control-Group (C2hyp) to estimate the effect of the reform relative to the counter-factual of a permanent G-8-model-system.
In summary, Table 1 already indicates that the most comparable T/C-setting for the medium-term models consists of the Treatment-Group-(T3) and Control-Group-(C2), because it focuses on territorial, Western German federal states that are very comparable in relevant characteristics (see also Table 5). Thereby, this setting still accounts for 37.6 out of 80.6 million inhabitants and thus about 50% of the German population and hence, it will be the main specification for the Base-MT model in the results section 7. However, as will be illustrated in section 7.2, I also conduct robustness checks using T5, T7 and C2hyp (Figure A.4).
For the short-term models, the following T/C-Groups with reform-time set between 2006 and 2009 exist:
• Neither T nor C: For the same reasons as in the medium-term, I exclude 3 states: SL, ST, MWP. As these federal states have enacted treatment for affected students already one period before the reform change reference point, they constitute neither a clean T- nor C-Group both in 2006 and 2009.
• The Treatment-Groups remain identical as in medium-term models, as only the year 2012 will be dropped in the new short-term models with the reform time still set between 2006 and 2009 (T3 = BW, BV, LS — T5 = BW, BV, LS, BR, HB — T7 = BW, BV, LS, BR, HB, BE, BB).
• Control-Group (C2): RP, SH and hypothetical Control-Group (C2hyp): Saxony(S), Thuringia(TH) These two Control-Groups remain the same as in Base-MT - excluding 2012 does not change the allocation of these federal states into Control-Groups. Control-Group (C3): RP, SH, North Rhine-Westphalia (NRW) Now, NRW as territorial, Western federal state with the largest population in Germany can be added to the Control-Group C2, as in the (Full)/Base-ST model, 9th graders were taught in a G-9-model- Gymnasium across the whole time period (2000)2003 until 2009. Control-Group (C4): RP, SH, NRW, Hesse (H). One can add to the Control-Group C3, H, to consider another territorial, Western federal state as part of the Control-Group. To do so one has to take the assumption that H can be classified still into the Control-Group in 2009, as by then only 10% of 9th graders may have been treated (compare Table 1)
Finally, one can note that most comparable T/C-setting for the short-term models consists of the Treatment- Group-(T3) and Control-Group-(C3). It still accounts for 55.2 out of 80.6 million inhabitants and thus about 68% of the German population. Hence, I choose it as main specification for the Base-ST model in the results section 7. However, robustness checks will be conducted using T5, T7 as well as C2, C2hyp and C4. In section 6, I will continue using the above Model-, Treatment- and Control-Group definitions to focus on the details of the specific empirical strategy that my analysis relies on (for an overview see Appendix A.2.1).
18 4.4 Internal Validity of the strategy
According to Meyer(1995) the main idea of a quasi-experiment consists in finding a source of exogenous variation in an explanatory variable. Since, this allows avoiding estimation biases due to omitted variables endogenously affecting both the explanatory and dependent variable. However, in order to benefit from a quasi-experimental setting, three potential problems regarding internal validity should be discussed. A first important criterion for internal validity is the comparability of both the Treatment and Control-Group in the pre-reform period. On the one hand, the German federal states join a similar legislative, cultural and economic framework and common qualification standards are coordinated by theSC. Thus, exploiting variation in the reform implementation process across federal states can be considered to be already an improvement compared to relying on cross-national variation as in many existing studies (cf. Woessmann(2010)). On the other hand, by carefully comparing pre-reform characteristics of both groups, potential selection bias can be identified and ruled out (see section 6.2). Secondly, one should consider whether the reform effect is driven not only by the explanatory variable of interest (increasing learning intensity), but by other factors that endogenously affect the outcome variable (IEOp). For instance, anticipatory effects might theoretically induce potentially affected students to move with their families into a state that has not yet implemented the Gymnasium-8-reform. If for the affected cohorts tested in 2009 such behavior had occurred with respect to a Treatment-Group (T-Group) pre-reform, i.e. before the school year 2004/2005, it might have changed the population’s composition across T- and C-Groups in a way that would bias estimation results. However, such anticipatory behavior is very unlikely. Since first, due to the fast implementation process of the G-8-reform across all federal states and the fact that half of them implemented the reform within three school years (2003/2004 until 2005/2006, see Table A.2) options for moving were limited. There is no systematic pattern regarding the timing and implementation of the G-8-reform and the geographical location of federal states (Andrietti, 2015). Furthermore, direct and indirect moving costs, including bureaucratic hurdles, appear to be the reasons why changing secondary school across federal states has always remained low. Finally, strategic issues concerning the competition for the access to study programs also support the assumption that selection bias due to movements between states is unlikely. Due to the reform several double cohorts graduating during years 2009 until 2016 temporarily increase the number of applicants for university studies in Germany. As this would inversely affect the probability of immediately entering a desired study program, by completing the G-8-model a student could at least “insure” herself against the risk of having to add in total another 14th year consisting of a gap year.51 Finally, one important criterion for internal validity is the common time trend (CTT) assumption (Angrist and Pischke, 2008), which requires that in absence of the reform, both treatment and control-group would have shown a similar time trend52. Although this cannot be tested directly, placebo tests (cf. Bertrand, Duflo, and Mullainathan(2004)) for different pre-reform time periods can be conducted as robustness checks for the internal validity of this paper’s strategy. Moreover, as argued in section 4.3 by restricting on a setting of Treatment-Group-(T3/T5/T7) vs. Control-Group-(C2) in model Base-MT or Control-Group-(C2/C3/C4) in model Base-ST, in all Treatment-Group-federal states the reform was implemented in the same year of 2004/2005. Thus, the quasi-experimental setting as described in Table 1 is unlikely to suffer from estimation bias due to non-random reasons for introducing the G-8-reform slightly earlier or later among federal states.53
51Since, instead of spending 13 years at school and having to wait 1 additional year before entering a desired study program, with 12 years of schooling, enrolling at university including now 1 gap year replacing the “saved” year would be possible. 52In other words, there are no compositional changes in the student body prior to the reform and PISA test scores would have followed the same patterns in both Treatment- and Control-Group in the absence of reform induced changes in learning intensity. 53As explained in Appendix A.2.2, the T/C-settings in Base-ST/MT take into account the political parties governing.
19 5 Data
To begin with, some background information on the OECD’s PISA data, its advantages and disadvantages to measure educational outcomes as well as the standardization procedure conducted on test scores is provided in Appendix A.1.2. In this section, first, I will focus on which specific PISA data are used for the analysis of the Gymnasium-8-reform. Second, some basic descriptives based on this dataset will be provided.
5.1 PISA Data used in this project
For Germany, two types of PISA test data are available (for information on data sources see Appendix A.1.1). First, for each testing cycle, two random samples of students taking the same test on the same day were chosen. After schools from each of the 16 federal states had been randomly selected, among them, on the one hand, about 25 students were randomly taken to be tested on the base of being 15 years old - as the international cross-country comparison relies on the age-based sample (compare ??); on the other hand, students were randomly chosen on the base of being in the 9th grade. For this purpose, within selected schools, randomly two classes of 9th graders with a minimum of 25 students were chosen, thus the grade-based samples include about twice the number of students in the typical age-based sample. Obviously, for the purpose of this study as explained in section 4, I rely on the representative, random sample of students chosen based on being in the 9th grade. Since, the G-8-reform affected students based on their school grade status in a certain school year (compare Table 1). Thereby, the 9th-grader-based sample consists for each PISA study of about 10,000 students from about 225 schools (compare Table 2).
Table 2: Available grade-sample based PISA-I datasets
"before" reform "after" reform
PISA-2000b PISA-2003-I PISA-2006-I PISA-2009-I PISA-2012-I
student-dataset 914 variables 1,292 variables 1,095 variables 1,231 variables 1,215 variables # of studentsa 34,754 8,559 9,577 9,460 9,998
reading reading reading reading reading test scoresd mathematics mathematics mathematics mathematics mathematics sciences sciences sciences sciences sciences
school-dataset 470 variables 572 variables 565 variables 534 variables 502 variables # of schools 1,342 216 226 226 230 teacher-datasetc - 653 variables - 639 variables 257 variables # of teachers - 1939 - 2,201 2,084 a Number of observations for students as included in the PISA datasets (2000, 2003, 2006, 2009, 2012) as available from the Institut zur Qualitätsentwicklung im Bildungswesen (Institute for Educational Quality Improvement) (IQB) based on the grade-based sample (see also Appendix A.1.1). Note, that here the student-dataset includes both the original student questionnaire answers and their parental ones. b Note that for the year 2000, there was no specific grade-based PISA-I-sample available from IQB. However, PISA-2000 being the PISA-2000-E dataset is 9th-grade-based (Baumert, Artelt, Klieme, Neubrand, Prenzel, Schiefele, Schneider, and Weiß, 2002). Therefore, it has a lower number of variables, but higher number of observations than the other datasets. c For years 2000 and 2006, the teacher-dataset was not part of the provided German specific PISA dataset via the IQB. d The test score domains in bold letters have been in focus for the respective PISA test cycle.
20 Second in Germany, national extensions (PISA-E-samples) were conducted for the years 2000, 2003 and 2006. Each of them consists of about 45-50,000 students. For this purpose, one day after the tests taken for the PISA-I-samples, in each federal state additional students randomly selected according to the same two-stage randomized survey design underwent the same testing procedures with a national questionnaire. Combined with the original PISA-I-samples, thus enlarged grade-based and/or age-based PISA-E-samples emerged. By oversampling less populated federal states, their aim was to enable more robust comparisons of educational performance among German states. However, PISA-E was discontinued in 2009 and the Standing Conference of the Ministers of Education and Cultural Affairs of the Länder (SC) replaced it by the IQB-Ländervergleichstest (federal state comparison test). From 2009 onwards, this comparison test aims to assess national educational standards determined by theSC. Notably, scores in the IQB-federal state comparison test are adjusted to resemble the PISA-E testing scale, thus allowing the different extended datasets to be compared over time (Baumert and Prenzel, 2009). However, as for 2009 only reading and for 2012 only mathematics and science scores have been recorded, a regular cross-section for all three testing domains cannot be constructed for these larger dataset versions (for an overview see Table A.1).
Instead for the grade-based PISA-I-samples, all three test score domains are available for all testing cycles (Table 2). The test is designed to enable reexamining the evolution of scores across federal states over time. Therefore, though being smaller than the PISA-E-samples, PISA-I-samples are still large enough and applying the associated weights to each student observation, representativeness of the data can be maintained.54 To have more consistency and comparability across the studies used, this paper relies on the grade-based PISA-I-samples for all years available and avoids mixing PISA-E- and PISA-I-datasets. In that regard, my sample differs to the one of Andrietti(2015) or Huebener et al.(2016) who combine PISA-E-2000/2003/2006 with PISA-I-2009.55 Thus, the empirical analysis undertaken in this study is based on a dataset pooling grade-based samples from PISA-I-2003/2006/2009/2012 and PISA-E-2000 for the extended time period models (Table 2). However, Base-MT (2003-2012) and Base-ST (2003-2009) models will be the preferred specifications as they do not require mixing PISA-I and PISA-E-2000 datasets 56
In summary, this paper relies on the grade-based version of the PISA-I samples to construct a representative repeated cross-section of German students in the 9th grade in Gymnasium that allows analyzing IEOp in response to the G-8-reform by using variables based on students’ PISA test scores and their background characteristics. About one third of secondary school students are in Gymnasium, and indeed about one third of the 9th-grader sample is in Gymnasium. Finally, due to resource constraints the analysis restricts to using mainly datasets including both variables derived from the questionnaire for students and their parents (student-dataset in Table 2). Given the fact, that so far the IQB does not provide access to all available teacher-datasets (esp. year 2006), this paper refrains from using data extracted from teacher questionnaires (teacher-dataset in Table 2). However, questions reappearing for cross-checking purposes in student, teacher or principal questionnaire (e.g. age, gender, etc.) are included in the dataset used for this paper.
54Therefore, sample size, test score scales and the main background information evaluated by the questionnaires for students, parents and the school headmasters are for the most part comparable between PISA-I-studies (Baumert and Prenzel, 2009). 55Note, that the IQB only provides age-based samples for PISA-E-2006. Thus, reducing the dataset to 15-years-old in the 9th grade would produce a much smaller sized sample for 2006 compared to the one for 2000. Such a dataset might be more likely affected by estimation bias, because the merger of two datasets originally created to be representative for different target populations raises questions on its representativeness. However, for robustness, one may try to replicate this paper’s approach using both the grade-based German-specific version of PISA-E-studies 2000, 2003, 2006 and the PISA-I-studies 2009 and 2012. 56Note that for the year 2000, there was no specific grade-sample based PISA-I sample available from IQB. However, PISA-2000 being the PISA-2000-E dataset is 9th-grade-based (Baumert et al., 2002). Then one has instead of the usual 80 replication weights, in fact about 768 weights. One also needs to keep more attention regarding weighting as in the larger samples (PISA-E or IQB-LV) student observations per testing domain may vary and different weights may be required per testing domain.
21 5.2 Descriptive statistics and Control-Variables
First, the descriptive statistics of the main outcome variables, PISA test scores in the domains of Reading, Mathematics and Science are shown in Table 3. As expected, by focusing on students in the academic-track of secondary school, mean PISA test scores are above the German average. A typical 9th grader in Gymnasium consistently achieves about 60 points higher results than the German average 9th grader, which corresponds to about an entire proficiency level (??), i.e. the value-added of two school years. Regarding the three testing areas, students in Gymnasium have performed worst in Reading. Moreover, they appear to have stagnated or rather slightly deteriorated in their reading skills between 2000 and 2012. This observation is in line with reports on German PISA test results for the years 2000-2009 illustrating that German students perform better in Maths relative to Reading (cf. Klieme, Artelt, Hartig, Jude, Köller, Prenzel, Schneider, and Stanat(2011)), with an average score in Maths (about 580) exceeding scores in Reading (about 570). Instead, students perform best in Sciences reaching up to 590 points. Thereby, for Maths and Sciences, no clear time patterns in scores can be detected. Furthermore, as Table 3 shows, for all three testing areas the median exceeds the mean test score. This indicates that there appears to be more variation on the low end of the performance scale with some students rather performing relatively bad, pushing the median down.57 As laid down in section 4.3 and due to what datasets are best comparable (section 5.1), an analysis of the Gymnasium-8-reform requires comparing Treatment-Groups (T3/T5/T7) to Control-Groups(C2/C2hyp/C4h) in model Base-MT or additionally to Control-Groups (C3/C4) in model Base-ST. Table 3 gives the allocation of representative students - whose number increases over each PISA testing cycle starting in 2003 - into the different T- and C-Groups. Following this grouping, the estimation sample contains at least 6.223 observations in the Base-MT model and at least 4.297 in the corresponding Base-ST case. Finally, the dataset in this paper contains more than 60 schools per test year across all federal states and on average for each testing cycle the number of students increases.
Regarding the selection of relevant control variables, this study follows the most common approaches used in the literature (compare section 3). To begin with, as illustrated by Gamboa and Waltenberg(2012) and explained in section 2, the choice of control variables in the context of trying to measure IEOp is always questionable. Since, one needs to include those variables as control that represent circumstances, i.e. factors which are beyond the control of a student, but which explain parts of the dependent variable of interest, i.e. cognitive skills as measured by test scores. Table 4 provides an overview of the main control variables used in the base model specification (Base-MT (2003-2012)).58 Given data restrictions, the control variables used can be divided into two main groups, namely student-level circumstances, i.e. personal characteristics and into socio-economic family background variables, such as parental household characteristics. Concerning student-level controls, as expected on average students are 15.42 years old and thus around the age of 15. The share of female students is slightly greater than that of male ones (53% of the sample being female). This reflects the fact that in recent years female participation in Gymnasium has been steadily higher compared to that of male students (Prenzel, Sälzer, Klieme, and Köller, 2013). Finally, migration background indicates that about 17% of students had at least one foreign born parent and thus reflects a different or additional cultural identification a student might associate with compared to the German one
57The mean-median comparison and its evolution over time may be regarded as first sign for whether IEOp changes over time. Here in Table 3, median and mean seem to deviate more after than before the reform. 58For simplicity, Table 4 just summarizes the main control variables in the pooled version of the main model specification (Base-ST (2003-2012)).
22 predominant in school. The variable language spoken at home is additionally taken into account within the category of migration background as student-level characteristics, as in combination with the birth place based migration variable, it improves the extent to which one controls for the student’s migration-background. In fact, depending on the level of parental integration, one might expect that not all students with migration background speak another language at home, but German. And indeed, less than half of the number of students with migration traits indicate to speak a different language at home (see Table 4). Clearly, all this individual characteristics (gender, age and migration background) can be classified as circumstances (as defined in section 2). Next, another set of control variables involves socio-economic family background variables. First, an important circumstance variable consists of a student’s parental education background, i.e. the highest educational achievement among a student’s parents. Since, in the spirit of Todd and Wolpin(2003) parental education can be considered as a key factor influencing the student’s human capital production. Moreover, it seems to be an indicator for potential support opportunities available to a student, however,
Table 3: Descriptive Statistics: Outcome Variables and Sample Size
"before" reform "after" reform
PISA-2000 PISA-2003-I PISA-2006-I PISA-2009-I PISA-2012-I
PISA test scores of 9th graders in Gymnasium Reading Mean 577.92 570.77 568.20 562.65 565.42 Reading SD 55.86 51.98 56.97 55.25 52.81 Reading Median 578.83 572.14 571.50 566.23 567.06
Mathematics Mean 573.65 583.66 571.39 578.53 575.73 Mathematics SD 62.18 57.85 58.48 56.59 58.52 Mathematics Median 572.6754 584.7017 571.1871 580.472 576.1879
Science Mean 575.14 591.15 585.01 590.48 580.44 Science SD 67.43 60.20 61.47 58.88 58.61 Science Median 576.35 594.80 587.12 594.68 581.07 Number of federal states 16 16 16 16 16 Number of schools 409 62 67 68 78 Number of students 10,276 3,017 3,356 3,473 3,910 Treatment-Group (T3) 1,917 987 1,188 1,467 1,626 Treatment-Group (T5) 3,175 1,090 1,275 1,568 1,761 Treatment-Group (T7) 4,524 1,412 1,587 1,778 2,029 Control-Group (C2) 1,225 153 194 308 300 h. Contr.-Gr. (C2hyp) 1,387 312 295 118 162 Control-Group(h) (C4h) 2,612 465 489 426 462 Control-Group (C3) 1,830 872 989 1,159 - Control-Group (C4) 2,434 1,062 1,272 1,458 - Note: This table reports summary statistics for the sample of 9th-graders in Gymnasium and is weighted by the sample weights provided in the PISA dataset from the IQB. Note, that the average across plausible values can be taken as a metric of individual-level performance (OECD, 2012). For further information on the test scores and its weighting procedure, I refer to Appendix A.1.2. Mean, standard deviations and median of the test scores across all federal states and for all academic track schools that are in the German PISA dataset is provided for each testing cycle (2000 (see footnotes a and b in Table 2), 2003, 2006, 2009, 2012). Finally, the number of observations for the different Treatment- and Control-Groups (section 4.3) is provided.
23 students at the age of 15 are unlikely to be in control of changing their parent’s educational attainments.
Table 4: Descriptive Statistics: Control Variables - Circumstances
Base-MT (2003-2012) Mean Std. Dev. Comments
Student-level Characteristics female-dummy 0.5289 0.4989 min-max:[0-1] Age in years 15.43 0.49 min-max:[13,75-17,25] migration background (Base category: German language/student and both parents born in Germany) - language spoken at home 0.0552 0.2285 min-max:[0-1]; missing: 0.0060 (0.0774) - migration background 0.1679 .3738 min-max:[0-1]; missing: 0.0060 (0.0774)
Parental characteristics Parental Education: (highest ISCED level) - ISCED-level (5-6): 0.6285 0.4832 For an explanation of ISCED, see Figure A.10 - ISCED-level (3-4): [Base cat.] 0.2812 0.4495 min-max:[0-1] - ISCED-level (1-2): 0.0532 0.2244 missing: 0.0371(0.1890) Socio-Economic Status Number of books in household: - + 500: 0.2029 0.4022 - 101-500 : [Base category] 0.4703 0.4991 min-max:[0-1] - 11-100: 0.2579 0.4375 missing: 0.0497 (0.2174) - max. 10: 0.0193 0.1375 Highest-ISEI-level of job in the family - highest ISEI-level: 57.1536 17.2042 min-max:[0-90]; missing: 0.0177 (0.1317)
Family Characteristics family structure - living up in single parent household ? - single parent household: [Base: No] 0.1317 0.3382 min-max:[0-1]; missing: 0.0808 (0.2726)
family structure - mother/father employment status Father - full-time (FT) [Base category]: 0.8120 0.3907 - part-time (PT) : 0.0584 0.2345 min-max:[0-1] - unemployed (UE) : 0.0251 0.1564 missing: 0.0728 (0.2598) - out-of-labor force (OLF) : 0.0318 0.1753 Mother - full-time (FT) [Base category]: 0.2972 0.4570 - part-time (PT) : 0.4379 0.4961 min-max:[0-1] - unemployed (UE) : 0.0452 0.2078 missing: 0.0603 (0.23812) - out-of-labor force (OLF) : 0.1593 0.3660 Number of students 13,756 G-8-reform-dummy: 0.4573 (0.4982) Note: This table reports summary statistics for the sample of 9th-graders in Gymnasium pooling the data for medium-term basic model specification (Base-MT (PISA-I-2003/2006/2009/2012)) (see section 4.3) and is weighted by the sampling weights provided in the PISA dataset from the IQB. In the comments column, the amount of missing observations is provided and standard deviations are reported in parentheses. For categorical control variables, the base category is indicated by italics. Finally, the number of observations and the G-8-reform-dummy share is provided.
24 For the purpose of measuring parental education, I rely on the International Standard Classification of Education (ISCED) index. It serves to identify whether mother, father or at least one parent has achieved an academic degree, in which case this household could be classified as being an academic household, i.e. having ISCED (level 5/6) (see Table 4). In the sample, about 60% of students live up in an academic household. A medium education category includes students whose parents’ highest educational achievement is upper-secondary/post-secondary, non-tertiary [ISCED (level 3/4)] education. The ISCED (level 1/2) includes only students whose parents have achieved not more than lower-secondary school. An overview of the definitions for these different categories in the ISCED scale is provided in Figure A.10. In order to take into account socio-economic status (SES) of a student’s family background, I exploit, first, the number of books at home as a common variable indicating SES environment in which a student grows up that is a standard control variable generated in all PISA studies. Due to having no information on household income, this variable has been shown to be a good alternative proxy for the family SES, as household income as well as SES are highly correlated with the amount of books in a household. Moreover, it is plausible to assume that students by the age of 15 are financially dependent on their parents and access to culture appears to be hugely influenced by the opportunities offered in the household a child lives up. Thus, it is mostly accepted that for students aged 15 the number of books variable represents circumstances controlling for family SES.59 I take the category of having 101-500 books as base category for this variable, as about 50% of students in the sample live in such a household. Similarly, the International Socio-Economic Index of Occupational Status (ISEI) index can be taken into account as a further control variable for socioeconomic background.60 Higher ISEI scores correspond to higher levels of occupational status. As parental occupational status is unlikely to be in the control of students, it also reflects circumstances due to parental SES.
Finally, to control for family structure characteristics, first, I take into account for whether a student has to live up in a single parent household. About 14% of all students are raised under such circumstances. Since it involves, for instance, being exposed to grow up in a more stressful environment. Second, I also consider both for mother and father employment status dummies. Since, by determining the time availability and family structure, aspects that influence the environment in which a student can learn (for school) are taken into account. In the sample, the vast majority of fathers is working full-time (FT) (more than 80%), whereas the majority of mothers is part-time employed (PT) (about 44%). This is consistent with the still predominant family model in Germany during the 2000s consisting of a bread-earning father and only part-time working mother mainly in charge of child care.61 Turning to the included missing variables for all control variables shows that the response rate was always above 90 percent (last column in Table 4). Students affected by the Gymnasium-8-reform constitute 45.73 percent of the sample (in the basic specification of the medium-term model Base-MT).
59As mentioned in section 2, Hufe et al.(2015) provide another reasoning why number of books can be taken as circumstances variable: Below the age of consent (e.g. 16), any student’s potential influence on the stock of books in the households is limited and rather not choice-driven. Instead, if a student at the age of 15 or younger made parents buy books, this is rather likely to be indicative of living up in more favorable SES, as most students under the age of 16 are rather influenced by the environment they grow up than vice versa. In other words, number of books is associated with circumstances at least during childhood. It is not likely that young children in households with few books change this fact because being rather used to this environment, they may be less likely interested in reading and lack financial resources. Therefore, more books likely correlate with more favorable SES. 60As alternative one may use the International Standard Classification of Occupation (ISCO) to determine parental SES. Thereby, parents’ occupational data were obtained by asking open-ended questions, the responses to which were coded into ISCO codes. However, it is not available for all PISA datasets, in contrast to the mapping of ISCO into ISEI indexes. 61In fact, with a school system based mostly on half-day schooling and as working parents had to face only a limited supply of institutions to take care of children after school, the described and in the data predominant family structure with a FT working father and mostly PT working mother has been predominant in Western Germany for many decades. However, since the late 2000s, a slow extension of all-day schools has started and this may change the situation of student generations born during the 2000s - a group of students that is not part of the student sample under investigation for this paper’s analysis period (2003-2012).
25 6 Empirical Strategy
This section, first, briefly explains the empirical strategy that allows analyzing the effects of increased learning intensity on a measure of IEOp (see section 2) by exploiting the ‘quasi-experimental’ setting of the Gymnasium-8-reform (see section 4.3). Second, section 6.2 provides evidence for the appropriateness of focusing the regression analysis in section 7 on the main specifications being Treatment/Control-Group T3 vs. C2 for the medium-term model (Base-MT) and T3 vs. C2/C3/C4 for the short-term model (Base-ST).
6.1 Methodology and Estimation Designs
Analyzing how the G-8-reform through increasing learning intensity changed educational opportunities for students in Gymnasium involves a two-step estimation procedure. First, appropriate measures of EEOp or IEOp need to be estimated given the available outcome and control variables in the data. Second, exploiting the quasi-experimental reform setting with estimated IEOp measures as dependent variable, the effect of interest can be obtained. Starting with the first step, this analysis follows the argumentation of section 2 on how to measure IEOp. 2 That is, θbIOP (equation (7)) will be the measure for IEOp. In other words, first the R from an OLS regression of PISA test scores on circumstance C variables has to be obtained both for the different Treatment- and Control-Groups for each of the time period models (see section 4.3 and 5.2). Thus, for both medium- and short-term perspective, the following regression model is run separately for all available Treatment (T3/T5/T7) and Control-Groups (C2,C2hyp and/or C3,C4) twice for the period before the reform ((2000)-2003-2006) and after the reform (2009-(2012)).62
Yist = β0 + β1(Individual Characterististics)ist + β2(P arental Characteristics)ist
+ β3(Socio − Economic Status)ist + β4(F amily Characteristics)ist
+ FE(federal state/school)s + ist (9) where Yist = {stdpvreadist; stdpvmathist; stdpvscieist} is the outcome variable, test scores, of student i in federal state s at time t in one of three PISA test domains (compare Table 2). For the purpose of a better interpretation of the β coefficients in equation (9), it is useful to standardize test scores to allow representing coefficients as effects in percentages of an international standard deviation in PISA test scores.63 Following Table 4 concerning the control variables, I decided to restrict in equation (9) to distinguishing about four (six) control variable sets to take into account relevant circumstances (Appendix A.2.1).
62 Note, that until section 7.2, I focus in notation on the main specifications, the Base models covering (2003-2012) (Appendix A.2.1). Furthermore, the R2 is calculated over, if applicable, the pooled sample of data in the respective pre-reform years and then separately over the corresponding pooled post-reform sample - with the general reform time set to take effect between 2006 and 2009, as explained in section 4. 63The PISA-test scores in the 500 scale metric as described in ?? have thus been standardized to a mean of 0 and a standard deviation of 1. Note that, generally, three ways of standardizing appear to be reasonable. 1. Standardizing test scores for students in Gymnasium that are part of the representative grade-based German PISA test cohort in the respective test year (stdpvsubject2 ): This allows interpreting coefficients relative to the performance in each testing year. 2. Standardizing test scores with respect to the pooled sample of all students in Gymnasium that are part of the representative grade-based German PISA test cohort in any of the test years that form the whole sample (e.g. 2003, 2006, 2009, 2012 in Base-MT)(stdpvsubject3 ): This allows interpreting coefficients relative to the performance of students across the whole sample period. 3. Standardizing test scores with respect to the sample of students in Gymnasium that are part of the representative grade-based German PISA test cohort in a reference test year, the first year in the respective time period model (e.g. 2003 in Base-ST/MT: stdpvsubject2003 ): This allows interpreting coefficients relative to the performance of students in a reference year.
26 Individual characteristics (IC) include the circumstances variables age and gender as well as migration background. As students were sampled based on being in the 9th grade, by controlling for age, differences in school entrance age and grade repetition potentially due to ability are taken into account. Thus, expecting a negative impact of age on test scores would be in line with that reasoning. Controlling for gender takes into account that subject-specific differences in academic test score performance between male and female students might be expected given the associated literature. For instance, Niederle and Vesterlund(2010) find that female students tend to perform better in verbal reasoning, but worse in mathematics compared to their male counterparts. Thus, a corresponding pattern for test scores could be anticipated. Migration background being a further personal fixed student characteristic has also been shown to be important in explaining academic achievements of students in standardized test scores in Germany (Klieme et al., 2011). Thereby, migration traits appear to be on average negatively correlated with performance due to, for instance, its adverse implications on non-cognitive skills such as self-esteem, motivation or aspirations. Socio-economic family background control variables include Parental Characteristics (PC) such as parental education levels, SES indicators such as the number of books in the household, and Family Characteristics (FC) such as family structure. Since, parental education and SES might be correlated with human capital investments on children and thus with student’s academic performance. A more academically stimulating environment tends to have positive impact on cognitive skill formation and in that regard parental education can be assumed to constitute circumstances capturing investments into a student’s early childhood. Similarly, favorable SES as measured by higher ISEI index values and/or more books available in a household should be expected to have a positive impact on a student’s test scores. Since, higher SES of the family in which a student grows up may be indicative for better and easier access to support opportunities for dealing with school-related work including preparations for tests. Otherwise, living up with a single parent or with unemployed parents might have a negative effect on test scores, because such family conditions may negatively impact skill formation and worsen access to out-of school support due to e.g. economic constraints. Moreover, for all versions of equation (9), i.e. for each combination of time period models, T/C-Groups, both pre- and post-reform, as well as for each set of control variables, federal states-fixed effects (FEs) or school-FEs can be included. Since, state-FEs take into account time-invariant differences in the outcome variables between federal states, for instance, due to variations in state-level spending on education or in school policies. Thereby, it is plausible to argue that the federal state in which a student grows up and goes to school represents a circumstance variable, because it is very likely to be beyond a student’s control where parents decide to live when their children reach the grade for entering secondary school, which is usually around the age of 10 (section 4.4).64 Using additionally school-FEs allows taking into account quality differences among schools (within a federal state) and controlling for other school-level circumstances.65
Concerning the second step, to identify the effect of the G-8-reform on a measure of IEOp as estimated in the first step, equation (9), one can apply a DID estimation method. Since, the gradual reform implementation at different points in time across federal states allows estimating the reform effect of increased learning intensity on IEOp by exploiting the differences between comparable T-/C-Groups (section 4.3). Thus, e.g. for model Base-MT, before denotes the pre-reform (2003-2006) and after the post-reform period (2009-2012).
64Evidence suggests, that the vast majority of students does not change school during Gymnasium and moreover as discussed in section 4.4, changing secondary school across federal states is uncommon and bureaucratically burdensome. 65However, using school-level controls, more caution may be initially required, as if there are potentially more discretionary factors in deciding which Gymnasium a student attends, this school would be not entirely beyond a student’s control. However, results do not change much using different forms of FEs suggesting these concerns are not relevant. Note, that by applying school-FEs without state-FEs, one can still control for characteristics both on school and state level (as federal states are in charge of school policies). As the PISA test is not conducted in the same schools across years, the school-FEs are wave-specific.
27 2 T 2 C Then, estimating the second step with Dbefore = (R )before − (R )before =6 0, we get: