Establishing the Discrete-Time Survival Analysis Model
Total Page:16
File Type:pdf, Size:1020Kb
What will we cover? §11.1 p.358 Specifying a suitable DTSA model §11.2 p.369 Establishing the Discrete-Time Fitting the DTSA model to data §11.3 p.378 Survival Analysis Model Interpreting the parameter estimates §11.4 p.386 (ALDA, Ch. 11) Displaying fitted hazard and survivor §11.5 p.391 functions Comparing DTSA models using §11.6 p.397 goodness-of-fit statistics. John Willett & Judy Singer Harvard University Graduate School of Education May, 2003 1 Specifying the DTSA Model Specifying the DTSA Model Data Example: Grade at First Intercourse Extract from the Person-Level & Person-Period Datasets Grade at First Intercourse (ALDA, Fig. 11.5, p. 380) • Research Question: Whether, and when, Not censored ⇒ did adolescent males experience heterosexual experience the event intercourse for the first time? Censored ⇒ did not • Citation: Capaldi, et al. (1996). experience the event • Sample: 180 high-school boys. • Research Design: – Event of interest is the first experience of heterosexual intercourse. Boy #193 was – Boys tracked over time, from 7th thru 12th grade. tracked until – 54 (30% of sample) were virgins (censored) at end he had sex in 9th grade. of data collection. Boy #126 was tracked until he had sex in 12th grade. Boy #407 was censored, remaining a virgin until he graduated. 2 Specifying the DTSA Model Specifying the DTSA Model Estimating Sample Hazard & Survival Probabilities Sample Hazard & Survivor Functions Grade at First Intercourse (ALDA, Table 11.1, p. 360) Grade at First Intercourse (ALDA, Fig. 10.2B, p. 340) Discrete-Time Hazard is the Survival probability describes the conditional probability that the chance that a person will survive event will occur in the period, beyond the period in question given that it hasn’t occurred without experiencing the event: earlier: S(t j ) = Pr{T > j} h(t j ) = Pr{T = j | T > j} Estimated by cumulating hazard: Estimated by the corresponding ˆ ˆ ˆ sample probability: S(t j ) = S(t j−1)[1− h(t j )] n events ˆ j e.g., h(t j ) = n at risk j ˆ ˆ ˆ S(t9 ) = S(t8 )[1− h(t9 )] e.g. = 0.8778[1− 0.1519] 8 hˆ(t ) = = 0.1176 = 0.7444 Median lifetime, ML, is the amount of time it takes for half the 9 68 population (sample) to experience the event of interest. 3 Specifying the DTSA Model Specifying the DTSA Model Introducing a Predictor into the Person-Period Dataset What Impact Does Predictor PT have? Grade at First Intercourse (from ALDA, Fig. 11.5, p. 380) Computing Sample Hazard & Survivor Probabilities, by PT Grade at First Intercourse (from ALDA, Table 11.1, p. 360) Time-invariant predictor, PT, indicates the presence/absence of a “parenting transition” during the boy’s early life: • 0 = boy lived with both biological parents thru 7th grade (n=72, 40% of sample). • 1 = boy experienced one or more parenting transitions, up thru 7th grade (n=108, 60% of sample). 4 Specifying the DTSA Model Specifying the DTSA Model What Impact Does Predictor PT have? If We Want To Model Hazard What Problems Do We Face? Grade at First Intercourse (ALDA. Fig. 11.1, p. 358) Grade at First Intercourse (ALDA, Fig. 11.2, 363) Hazard is a Estimated hazard What impact does 1.0 predictor level have probability, 0.8 on elevation of bounded by hazard function? zero and one 0.6 One or more early What shape is 0.4 parenting transitions hazard function, 0.2 at each level of No early parenting transitions the predictor? 0.0 6789101112 Grade Estimated odds 1.0 p Odds = 1− p 0.8 One or more early are less of a 0.5 parenting transitions problem, bounded only 0.3 No early below, by zero parenting transitions 0.0 6789101112 Grade Estimated logit(hazard) 0.0 One or more early parenting transitions Log-Odds, or -1.0 logits, are no No early -2.0 problem at all, parenting transitions they are without bounds above -3.0 and below -4.0 6 7 8 9 101112 Grade 5 Specifying the DTSA Model Specifying the DTSA Model What Population Models Could’ve Generated These Data? What Statistical Model Could Have Generated The Data? Grade at First Intercourse Data (ALDA, Figure 11.3, p. 366) Grade at First Intercourse (ALDA, Figure 11.3, p. 366) Datapoints: Logit(hazard) + ⇒ PT = 0; • ⇒ PT = 1 Flat population 0.0 " Logit(hazard) 0.0 " logit-hazard, elevated when PT -1.0 " " " -1.0 " switches from zero " ! ! PT = 1 " ! ! to one. ! -2.0 ! -2.0 " ! PT = 0" ! PT = 1 -3.0 " -3.0 PT = 0 " ! ! -4.0 ! ! 6789101112 Population logit- -4.0 Grade Logit(hazard) hazard linear with 6789101112 0.0 " time, elevated when Grade PT switches from -1.0 " " zero to one. " ! ! What are the necessary features of a reasonable statistical -2.0 " ! ! PT = 1 model for discrete-time logit-hazard? -3.0 " PT = 0 ! ! They include: -4.0 6789101112 Population logit- • For each predictor value, there is a population logit-hazard Grade Logit(hazard) hazard has a function. 0.0 " general shape, • Each population logit-hazard function has an identical shape, elevated when PT -1.0 " " regardless of predictor value. " ! ! switches from zero • Differences in predictor value “shift” the logit-hazard function ! to one. -2.0 PT = 1 " ! “vertically” " – So, the vertical “distance” between pairs of hypothesized logit- -3.0 PT = 0 hazard functions is the same in every time period. ! ! -4.0 6789101112 Grade 6 Specifying the DTSA Model Specifying the DTSA Model How Do We Specify A Model That has These Features? How Does It Work? Grade at First Intercourse Grade at First Intercourse Data (ALDA, Fig. 11.4, p. 374) Add a set of dummy predictors to the person- logit h(tij ) = [α 7 D7 +K+α j D j +K+α12 D12 ] + β1PTi period dataset, to represent time generically When PT=0, you get the Baseline Logit-Hazard Function: logit h(tij)=[α7D7 +K+α12D12] When PT=1, the Baseline Function shifts “vertically”: logit h(tij ) = [Baseline Function ] + β1 Logit(hazard) 0.0 -1.0 , And specify a statistical model for discrete-time logit-hazard α12 β1 , , that looks like this: -2.0 α11 PT = 1 , α10 α9 logit h(tij ) = [α 7 D7 + +α j D j + +α12 D12 ] + β1PTi K K -3.0 PT = 0 , α7 ,α8 -4.0 How does the model work? How do we fit it to data? 6 7 8 9 10 11 12 (D7=1) (D8=1) ... ... ... (D12=1) Grade 7 Specifying the DTSA Model Specifying the DTSA Model Other Ways of Writing the DTSA Model Other Ways of Displaying the DTSA Model Grade at First Intercourse Data (ALDA, Table 11.2, p. 376) Grade at First Intercourse Data (ALDA, Fig. 11.4, p. 374) Logit(hazard) 0.0 When hazard is expressed on a logit -1.0 , scale, the vertical α12 β1 , , α11 distance between -2.0 PT= 1 α10 , functions is identical α9 -3.0 PT= 0, in every time period. α7 ,α8 -4.0 6 7 8 9 10 11 12 (D7=1) (D8=1) ... ... ... (D12=1) Grade Odds 0.8 When hazard is 0.6 expressed on an odds So, you can “de-transform” the entire logit-hazard model, scale, the functions and it starts to look recognizable (?): 0.4 are magnifications or diminutions of each 0.2 other -- they are PT= 1 1 PT= 0 proportional h(t ) = 0.0 ij 6 7 8 9 10 11 12 −{}[][]α D +...+α D + β X +β X +... 1 1ij J Jij 1 1ij 2 2ij Grade 1+ e Hazard 0.5 0.4 0.3 0.2 ? Notice how 0.1 PT= 1 we’ve begun PT= 0 0.0 to generalize 6 7 8 9 10 11 12 the model? Grade The “standard” DTSA model is a proportional odds model! 8 Fitting the DTSA Model to Data Fitting the DTSA Model to Data First, Add a Continuous Predictor to the pp Dataset Use Logistic Regression Analysis in the PP Dataset Grade at First Intercourse Data (ALDA, Fig. 11.5, p. 380) Grade at First Intercourse Use logistic regression analysis to fit the hypothesized DTSA model in the person- period dataset. Treat EVENT as the outcome, and regress it on the predictors: •Time indicators, D1 thru DJ , •Substantive predictors, PT and PAS. Using sensible data-analytic practices. PAS is a continuous time-invariant measure of parents’ antisocial behavior during the child’s formative years. Scores on the measure have been standardized to mean 0, standard deviation 1. All parameter estimates, standard errors, t- and z-statistics, goodness-of-fit statistics, and tests will be correct for the discrete-time hazard model 9 Fitting the DTSA Model to Data Interpreting Parameter Estimates Using Sensible Data-Analytic Skills to Produce a Interpreting Parameters Associated w/ the Time Dummies Taxonomy of Fitted DTSA Models Grade at First Intercourse (ALDA, Figs. 11.3 & 11.4, 386-8) Grade at First Intercourse (ALDA, Table 11.3., p. 386) Estimates of the parameters associated with the time dummies provide the fitted hazard function for the baseline group. In Model A, everyone is in the “baseline group” 10 Interpreting Parameter Estimates Interpreting Parameter Estimates Substantive Dichotomous Predictors Substantive Continuous Predictors Grade at First Intercourse Grade at First Intercourse Anti-logging again provides As in regular logistic a fitted odds-ratio, now regression analysis, anti- associated with a unit logging a parameter estimate difference in the continuous gives the fitted odds-ratio predictor, PAS: associated with a unit ˆ β PAS 0.4428 difference in the predictor: e = e =1.56 ˆ eβ PT = e0.8736 = 2.4 The fitted odds of first intercourse for boys whose The fitted odds of first intercourse parents exhibited a particular level of antisocial behavior for boys who have experienced a are 1.56 times the odds for boys whose parental parenting transition are 2.4 times antisocial behavior was one standard deviation lower.