An Introduction to Nonlinear Mixed Effects Models and PK/PD Analysis

Marie Davidian Department of North Carolina State University

http://www.stat.ncsu.edu/ davidian ∼

1 Outline

1. Introduction 2. Pharmacokinetics and pharmacodynamics 3. Model formulation 4. Model interpretation and inferential objectives 5. Inferential approaches 6. Applications 7. Extensions 8. Discussion

2 Some references

Material in this webinar is drawn from:

Davidian, M. and Giltinan, D.M. (1995). Nonlinear Models for Repeated Measurement . Chapman & Hall/CRC Press.

Davidian, M. and Giltinan, D.M. (2003). Nonlinear models for repeated measurement data: An overview and update. Journal of Agricultural, Biological, and Environmental Statistics 8, 387–419.

Davidian, M. (2009). Non-linear mixed-effects models. In Longitudinal Data Analysis, G. Fitzmaurice, M. Davidian, G. Verbeke, and G. Molenberghs (eds). Chapman & Hall/CRC Press, ch. 5, 107–141. Shameless promotion:

3 Introduction

Common situation in the biosciences: A continuous outcome evolves over time (or other condition) within • individuals from a population of interest Scientific interest focuses on features or mechanisms that underlie • individual time trajectories of the outcome and how these vary across the population A theoretical or empirical model for such individual profiles, typically • nonlinear in parameters that may be interpreted as representing such features or mechanisms, is available Repeated measurements over time are available on each individual in • a sample drawn from the population Inference on the scientific questions of interest is to be made in the • context of the model and its parameters

4 Introduction

Nonlinear mixed effects model: Also known as the hierarchical nonlinear model • A formal statistical framework for this situation • Much statistical methodological research in the early 1990s • Now widely accepted and used, with applications routinely reported • and commercial and free software available Extensions and methodological innovations are still ongoing • Objectives: Provide an introduction to the formulation, utility, and • implementation of nonlinear mixed models For definiteness, focus on pharmacokinetics and pharmacodynamics • as major application area

5 Pharmacokinetics and pharmacodynamics

Premise: Understanding what goes on between dose (administration) and response can yield information on How best to choose doses at which to evaluate a drug • Suitable dosing regimens to recommend to the population, • subpopulations of patients, and individual patients Labeling • Key concepts: Pharmacokinetics (PK ) – “what the body does to the drug” • Pharmacodynamics (PD ) – “what the drug does to the body” • An outstanding overview: “Pharmacokinetics and pharmacodynamics ,” by D.M. Giltinan, in Encyclopedia of , 2nd edition

6 Pharmacokinetics and pharmacodynamics

dose concentration response - - - ...... PK PD ...... ¡ ...... ¡ ......

7 Pharmacokinetics and pharmacodynamics

Dosing regimen: Achieve therapeutic objective while minimizing toxicity and difficulty of administration How much ? How often ? To whom ? Under what conditions ? • Information on this: Pharmacokinetics Study of how the drug moves through the body and the processes • that govern this movement

(Elimination = metabolism and excretion )

8 Pharmacokinetics and pharmacodynamics

Basic assumptions and principles: There is an “effect site ” where drug will have its effect • Magnitudes of response and toxicity depend on drug concentration • at the effect site Drug cannot be placed directly at effect site, must move there • Concentrations at the effect site are determined by ADME • Concentrations must be kept high enough to produce a desirable • response, but low enough to avoid toxicity

= “Therapeutic window ” ⇒ (Usually ) cannot measure concentration at effect site directly, but • can measure in blood/plasma/serum; reflect those at site

9 Pharmacokinetics and pharmacodynamics

Pharmacokinetics (PK): First part of the story Broad goal of PK analysis : Understand and characterize • intra-subject ADME processes of drug absorption , distribution , metabolism and excretion (elimination ) governing achieved drug concentrations ...and how these processes vary across subjects (inter-subject • variation )

10 Pharmacokinetics and pharmacodynamics

PK studies in humans: “Intensive studies ” Small number of subjects (often healthy volunteers ) • Frequent samples over time, often following single dose • Usually early in drug development • Useful for gaining initial information on “typical ” PK behavior in • humans and for identifying an appropriate PK model... Preclinical PK studies in animals are generally intensive studies •

11 Pharmacokinetics and pharmacodynamics

PK studies in humans: “Population studies ” Large number of subjects (heterogeneous patients) • Often in later stages of drug development or after a drug is in • routine use Haphazard , sparse over time, multiple dosing intervals • Extensive demographic and physiologic characteristics • Useful for understanding associations between patient characteristics • and PK behavior = tailored dosing recommendations ⇒

12 Pharmacokinetics and pharmacodynamics

Theophylline study: 12 subjects, same oral dose (mg/kg) Theophylline Concentration (mg/L) Theophylline 0 2 4 6 8 10 12

0 5 10 15 20 25 Time (hours)

13 Pharmacokinetics and pharmacodynamics

Features: Intensive study • Similarly shaped concentration-time profiles across subjects • ...but peak, rise, decay vary • Attributable to inter-subject variation in underlying PK behavior • (absorption, distribution, elimination)

Standard: Represent the body by a simple system of compartments Gross simplification but extraordinarily useful. . . •

14 Pharmacokinetics and pharmacodynamics

One-compartment model with first-order absorption, elimination:

oral dose D - A(t) - ka ke

dA(t) = k A (t) k A(t), A(0) = 0 dt a a − e dA (t) a = k A (t), A (0) = D dt − a a a

F = bioavailability, Aa(t) = amount at absorption site

A(t) k DF Concentration at t : m(t)= = a exp( k t) exp( k t) , V V (k k ){ − e − − a } a − e ke = Cl/V, V = “volume ” of compartment, Cl = clearance

15 Pharmacokinetics and pharmacodynamics

One-compartment model for theophylline: Single “blood compartment ” with fractional rates of absorption k • a and elimination ke Deterministic mathematical model • Individual PK behavior characterized by PK parameters • ′ θ =(ka,V,Cl)

By-product: The PK model assumes PK processes are dose-independent • = Knowledge of the values of θ =(k ,V,Cl)′ allows • ⇒ a determination of concentrations achieved at any time t under different doses Can be used to develop dosing regimens •

16 Pharmacokinetics and pharmacodynamics

Objectives of analysis: Estimate “typical ” values of θ =(k ,V,Cl)′ and how they vary in • a the population of subjects based on the longitudinal concentration data from the sample of 12 subjects = Must incorporate the (theoretical ) PK model in an appropriate • ⇒ (somehow...)

17 Pharmacokinetics and pharmacodynamics

Quinidine population PK study: N = 136 patients undergoing treatment with oral quinidine for atrial fibrillation or arrhythmia Demographic/physiologic characteristics : Age, weight, height, • ethnicity/race, smoking status, ethanol abuse, congestive heart

failure, creatinine clearance, α1-acid glycoprotein concentration, . . . Samples taken over multiple dosing intervals = • ⇒ (dose time, amount) = (sℓ,Dℓ) for the ℓth dose interval Standard assumption: “Principle of superposition ” = multiple • ⇒ doses are “additive ” One compartment model gives expression for concentration • at time t...

18 Pharmacokinetics and pharmacodynamics

For a subject not yet at a steady state:

A (s ) = A (s − )exp k (s s − ) + D , a ℓ a ℓ 1 {− a ℓ − ℓ 1 } ℓ ka m(s ) = m(s − )exp k (s s − ) + A (s − ) ℓ ℓ 1 {− e ℓ − ℓ 1 } a ℓ 1 V (k k ) a − e exp k (s s − ) exp k (s s − ) . × {− e ℓ − ℓ 1 }− {− a ℓ − ℓ 1 } h i k m(t) = m(s )exp k (t s ) + A (s ) a ℓ {− e − ℓ } a ℓ V (k k ) a − e exp k (t s ) exp k (t s ) , s

Objective of analysis: Characterize typical values of and variation in ′ θ =(ka,V,Cl) across the population and elucidate systematic associations between θ and patient characteristics

19 Pharmacokinetics and pharmacodynamics

Data for a representative subject:

time conc. dose age weight creat. glyco. (hours) (mg/L) (mg) (years) (kg) (ml/min) (mg/dl) 0.00 – 166 75 108 > 50 69 6.00 – 166 75 108 > 50 69 11.00 – 166 75 108 > 50 69 17.00 – 166 75 108 > 50 69 23.00 – 166 75 108 > 50 69 27.67 0.7 – 75 108 > 50 69 29.00 – 166 75 108 > 50 94 35.00 – 166 75 108 > 50 94 41.00 – 166 75 108 > 50 94 47.00 – 166 75 108 > 50 94 53.00 – 166 75 108 > 50 94 65.00 – 166 75 108 > 50 94 71.00 – 166 75 108 > 50 94 77.00 0.4 – 75 108 > 50 94 161.00 – 166 75 108 > 50 88 168.75 0.6 – 75 108 > 50 88

height=72 inches, Caucasian, smoker, no ethanol abuse, no CHF

20 Pharmacokinetics and pharmacodynamics

Pharmacodynamics (PD): Second part of the story What is a “good ” drug concentration? • What is the “therapeutic window ?” Is it wide or narrow ? Is it the • same for everyone ? Relationship of response to drug concentration • PK/PD study : Collect both concentration and response data from • each subject

21 Pharmacokinetics and pharmacodynamics

Argatroban PK/PD study: Anticoagulant N = 37 subjects assigned to different constant infusion rates • (doses) of 1 to 5 µ/kg/min of argatroban Administered by intravenous infusion for 4 hours (240 min) • PK (blood samples) at • (30,60,90,115,160,200,240,245,250,260,275,295,320) min PD : additional samples at 5–9 time points, measured activated • partial thromboplastin time (aPTT, the response)

Effect site: The blood

22 Pharmacokinetics and pharmacodynamics

Infusion rate 1.0 µg/kg/min Infustion rate 4.5 µg/kg/min Argatroban Concentration (ng/ml) Argatroban Concentration (ng/ml) Concentration Argatroban 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200

0 100 200 300 0 100 200 300

Time (min) Time (min)

23 Pharmacokinetics and pharmacodynamics

Argatroban PK model: One-compartment model with constant intravenous infusion rate D (µg/kg/min) for duration tinf = 240 min

D Cl Cl ′ mPK (t)= exp (t t ) exp t , θ =(Cl,V ) Cl − V − inf + − − V      x = 0 if x 0 and x = x if x> 0 + ≤ + PK parameters : θPK =(Cl,V )′ • Estimate “typical ” values of θPK =(Cl,V )′ and how they vary in • the population of subjects Understand relationship between achieved concentrations and • response (pharmacodynamics ...)

24 Pharmacokinetics and pharmacodynamics Argatroban Concentration (ng/ml) Argatroban Concentration (ng/ml) Concentration Argatroban 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200

0 100 200 300 0 100 200 300

Time (minutes) Time (minutes)

25 Pharmacokinetics and pharmacodynamics

Response-concentration for 4 subjects:

Subject 15 Subject 19

aPTT (seconds) 0 20 40 60 80 100 0 20 40 60 80 100

0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500

Subject 28 Subject 33

aPTT (seconds) 0 20 40 60 80 100 0 20 40 60 80 100

0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 Predicted argatroban conc. (ng/ml) Predicted argatroban conc. (ng/ml)

26 Pharmacokinetics and pharmacodynamics

Argatroban PD model: “Emax model ”

PD Emax E0 m (t)= E0 + − PK 1+ EC50/m (t) Response at time t depends on concentration at effect site at time t • (same as concentration in blood here) PD parameters : θPD =(E , E , EC )′ • 0 max 50 Also depends on PK parameters • Estimate “typical ” values of θPD =(E , E , EC )′ and how • 0 max 50 they vary in the population of subjects

27 Pharmacokinetics and pharmacodynamics

Ultimate objective: Put PK and PD together to Characterize the “therapeutic window ” and how it varies across • subjects Develop dosing regimens targeting achieved concentrations leading • to therapeutic response For population, subpopulations, individuals • Decide on dose(s) to carry forward to future studies •

28 Pharmacokinetics and pharmacodynamics

Summary: Common themes An outcome (or outcomes) evolves over time; e.g., concentration in • PK, response in PD Interest focuses on underlying mechanisms/processes taking place • within an individual leading to outcome trajectories and how these vary across the population A (usually deterministic ) model is available representing • mechanisms explicitly by scientifically meaningful model parameters Mechanisms cannot be observed directly • = Inference on mechanisms must be based on repeated • ⇒ measurements of the outcome(s) over time on each of a sample of N individuals from the population

29 Pharmacokinetics and pharmacodynamics

Other application areas: Toxicokinetics (Physiologically-based pharmacokinetic – PBPK – • models) HIV dynamics • Stability testing • Agriculture • Forestry • Dairy science • Cancer dynamics • More ... •

30 Model formulation

Nonlinear mixed effects model: Embed the (deterministic ) model describing individual trajectories in a statistical model Formalizes knowledge and assumptions about variation in outcomes • and mechanisms within and among individuals Provides a framework for inference based on repeated measurement • data from N individuals For simplicity : Focus on univariate outcome (= drug concentration • in PK); multivariate (PK/PD ) outcome later

Basic set-up: N individuals from a population of interest, i = 1,...,N For individual i, observe n measurements of the outcome • i

Yi1,Yi2,...,Yini at times ti1, ti2,...,tini

I.e., for individual i, Y at time t , j = 1,...,n • ij ij i

31 Model formulation

Within-individual conditions of observation: For individual i, U i Theophylline : U = D = oral dose for i at time 0 (mg/kg) • i i Argatroban : U =(D , t ) = infusion rate and duration for i • i i inf Quinidine : For subject i observed over di dosing intervals, U i has • ′ elements (siℓ,Diℓ) , ℓ = 1,...,di U are “within-individual covariates ” – needed to describe • i outcome-time relationship at the individual level

32 Model formulation

Individual characteristics: For individual i, Ai Age, weight, ethnicity, smoking status, renal function, etc... • For now : Elements of A do not change over observation period • i A are “among-individual covariates ” – relevant only to how • i individuals differ but are not needed to describe outcome-time relationship at the individual level

′ ′ ′ Observed data: (Y i, Xi) , i = 1,...,N, assumed independent across i Y =(Y ,...,Y )′ • i i1 ini ′ ′ X =(U , A )′ = combined within- and among-individual • i i i covariates (for brevity later)

Basic model: A two-stage hierarchy

33 Model formulation

Stage 1 – Individual-level model:

Y = m(t , U , θ )+ e , j = 1,...,n , θ ( 1) ij ij i i ij i i × E.g., for theophylline (F 1) • ≡ k D m(t, U , θ )= ai i exp( Cl t/V ) exp( k t) i i V (k Cl /V ){ − i i − − ai } i ai − i i ′ ′ θi =(kai, Vi, Cli) =(θi1,θi2,θi3) , r = 3, U i = Di

Assume e = Y m(t , U , θ ) satisfy E(e U , θ ) = 0 • ij ij − ij i i ij | i i = E(Y U , θ )= m(t , U , θ ) for each j ⇒ ij | i i ij i i Standard assumption : e and hence Y are conditionally normally • ij ij distributed (on U i, θi) More shortly... •

34 Model formulation

Stage 2 – Population model:

θ = d(A , β, b ), i = 1,...,N, (r 1) i i i × d is r-dimensional function describing relationship between θ and • i Ai in terms of ... β (p 1) fixed parameter (“fixed effects ”) • × b (q 1) “random effects ” • i × Characterizes how elements of θ vary across individual due to • i – Systematic associations with Ai (modeled via β)

– “Unexplained variation ” in the population (represented by bi) Usual assumptions : • E(b A )= E(b )= 0 and Cov(b A )= Cov(b )= G, b N(0, G) i | i i i | i i i ∼

35 Model formulation

Stage 2 – Population model:

θi = d(Ai, β, bi), i = 1,...,N

′ Example: Quinidine, θi =(kai, Vi, Cli) (r = 3) A =(w ,δ , a )′, w = weight, , a = age, • i i i i i i δi = I(creatinine clearance > 50 ml/min) b =(b ,b ,b )′ (q = 3), β =(β ,...,β )′ (p = 7) • i i1 i2 i3 1 7

kai = θi1 = d1(Ai, β, bi) = exp(β1 + bi1),

Vi = θi2 = d2(Ai, β, bi) = exp(β2 + β4wi + bi2),

Cli = θi3 = d3(Ai, β, bi) = exp(β3 + β5wi + β6δi + β7ai + bi3),

Positivity of k , V , Cl enforced • ai i i If b N(0, G), k , V , Cl are each lognormally distributed in the • i ∼ ai i i population

36 Model formulation

Stage 2 – Population model:

θi = d(Ai, β, bi), i = 1,...,N

′ Example: Quinidine, continued, θi =(kai, Vi, Cli) (r = 3) “Are elements of θ fixed or random effects ?” • i “Unexplained variation ” in one component of θ “small ” relative to • i others – no associated random effect, e.g., r = 3, q = 2

kai = exp(β1 + bi1)

Vi = exp(β2 + β4wi) (all population variation due to weight)

Cli = exp(β3 + β5wi + β6δi + β7ai + bi3)

An approximation – usually biologically implausible ; used for • parsimony, numerical stability

37 Model formulation

Stage 2 – Population model:

θi = d(Ai, β, bi), i = 1,...,N

Allows nonlinear (in β and b ) specifications for elements of θ • i i May be more appropriate than linear specifications (positivity • requirements, skewed distributions) Some accounts: Restrict to linear specification

θi = Aiβ + Bibi

A (r p) “ ” depending on elements of A • i × i B (r q) typically 0s and 1s (identity matrix if r = q) • i × Mainly in the statistical literature •

38 Model formulation

Stage 2 – Linear population model:

θi = Aiβ + Bibi

Example: Quinidine, continued ∗ ∗ ∗ ′ ∗ Reparameterize in terms of θi =(kai, Vi , Cli ) , kai = log(kai), • ∗ ∗ Vi = log(Vi), and Cli = log(Cli) (r = 3)

∗ kai = β1 + bi1, ∗ Vi = β2 + β4wi + bi2, ∗ Cli = β3 + β5wi + β6δi + β7ai + bi3

1000 0 00 1 0 0

Ai =  0 1 0 wi 0 0 0  ,Bi =  0 1 0   001 0 w δ a   0 0 1   i i i       

39 Model formulation

Stage 2 – Population model:

θi = d(Ai, β, bi), i = 1,...,N

Simplest example: Argatroban PK - No among-individual covariates θPK =(Cl , V )′ (r = 2), take • i i i

Cli = exp(β1 + bi1)

Vi = exp(β2 + bi2)

PK ∗ ∗ ′ ∗ Or reparameterize with θi =(Cli , Vi ) , Cli = log(Cli), • ∗ Vi = log(Vi) ∗ Cli = β1 + bi1 ∗ Vi = β2 + bi2 so A = B = (2 2) identity matrix i i ×

40 Model formulation

Within-individual considerations: Complete the Stage 1 individual-level model Assumptions on the distribution of Y given U and θ • i i i Focus on a single individual i observed under conditions U • i Y at times t viewed as intermittent observations on a • ij ij stochastic process

Yi(t, U i)= m(t, U i, θi)+ ei(t, U i) E e (t, U ) U , θ = 0, E Y (t, U ) U , θ = m(t, U , θ ) for all t { i i | i i} { i i | i i} i i Y = Y (t , U ), e = e (t , U ) • ij i ij i ij i ij i “Deviation ” process e (t, U ) represents all sources of variation • i i acting within an individual causing a realization of Yi(t, U i) to

deviate from the “smooth ” trajectory m(t, U i, θi)

41 Model formulation

Conceptualization: Concentration 0 200 400 600 800 1000

0 50 100 150 200 250 300 350 Time

42 Model formulation

Conceptual interpretation: Solid line : m(t, U , θ ) represents “inherent tendency ” for i’s • i i outcome to evolve over time; depends on i’s “inherent

characteristics ” θi Dashed line : Actual realization of the outcome – fluctuates about • solid line because m(t, U i, θi) is a simplification of complex truth Symbols : Actual, intermittent measurements of the dashed line – • deviate from the dashed line due to measurement error Result: Two sources of intra-individual variation “Realization deviation ” • Measurement error variation • m(t, U , θ ) is the average of all possible realizations of measured • i i outcome trajectory that could be observed on i

43 Model formulation

To formalize: ei(t, U i)= eR,i(t, U i)+ eM,i(t, U i) Within-individual stochastic process • Yi(t, U i)= m(t, U i, θi)+ eR,i(t, U i)+ eM,i(t, U i) E e (t, U ) U , θ = E e (t, U ) U , θ = 0 { R,i i | i i} { M,i i | i i} = Y = Y (t , U ), e (t , U )= e , e (t , U )= e • ⇒ ij i ij i R,i ij i R,ij M,i ij i M,ij

Yij = m(tij, U i, θi)+ eR,ij + eM,ij

eij ′ | {z } ′ eR,i =(eR,i1,...,eR,ini ) , eM,i =(eM,i1,...,eM,ini ) e (t, U ) = “realization deviation process ” • R,i i e (t, U ) = “measurement error deviation process ” • M,i i Assumptions on e (t, U ) and e (t, U ) lead to a model for • R,i i M,i i Cov(e U , θ ) and hence Cov(Y U , θ ) i | i i i | i i

44 Model formulation

Conceptualization: Concentration 0 200 400 600 800 1000

0 50 100 150 200 250 300 350 Time

45 Model formulation

Realization deviation process: Natural to expect e (t, U ) and e (s, U ) at times t and s to be • R,i i R,i i positively correlated , e.g.,

corr e (t, U ), e (s, U ) U , θ = exp( ρ t s ), ρ 0 { R,i i R,i i | i i} − | − | ≥ Assume variation of realizations about m(t, U , θ ) are of similar • i i magnitude over time and individuals, e.g.,

Var e (t, U ) U , θ = σ2 0 (constant for all t) { R,i i | i i} R ≥ Or assume variation depends on m(t, U , θ ), e.g., • i i Var e (t, U ) U , θ = σ2 m(t, U , θ ) 2η, η> 0 { R,i i | i i} R{ i i } Result : Assumptions imply a model (n n ) • i × i ′ ′ Cov(e U) , θ )= V (U , θ , α ), α =(σ2 , ρ) or α =(σ2 , ρ, η) R,i | i i R,i i i R R R R R

46 Model formulation

Conceptualization: Concentration 0 200 400 600 800 1000

0 50 100 150 200 250 300 350 Time

47 Model formulation

Measurement error deviation process: Measuring devices commit haphazard errors = • ⇒ corr e (t, U ), e (s, U ) U , θ = 0 for all t>s { M,i i M,i i | i i} Assume magnitude of errors is similar regardless of level, e.g., • Var e (t, U ) U , θ = σ2 0 (constant for all t) { M,i i | i i} M ≥ Or assume magnitude changes with level; often approximated under • assumption Var e (t, U ) U , θ << Var e (t, U ) U , θ { R,i i | i i} { M,i i | i i} Var e (t, U ) U , θ = σ2 m(t, U , θ ) 2ζ , ζ> 0 { M,i i | i i} M { i i } Result : Assumptions imply a covariance model (n n ) • i × i (diagonal matrix )

′ Cov(e U) , θ )= V (U , θ , α ), α = σ2 or α =(σ2 , ζ) M,i | i i M,i i i R M M M M

48 Model formulation

Combining: Standard assumption : e (t, U ) and e (t, U ) are independent • R,i i M,i i

Cov(e U , θ ) = Cov(e U , θ )+ Cov(e U , θ ) i | i i R,i | i i M,i | i i = VR,i(U i, θi, αR)+ VM,i(U i, θi, αM )

= Vi(U i, θi, α)

′ ′ ′ α =(αR, αM ) This assumption may or may not be realistic •

Practical considerations: Quite complex intra-individual covariance models can result from faithful consideration of the situation...... But may be difficult to implement •

49 Model formulation

Standard model simplifications: One or more might be adopted Negligible measurement error = • ⇒

Vi(U i, θi, α)= VR,i(U i, θi, αR)

The t may be at widely spaced intervals = • ij ⇒ among e negligible = V (U , θ , α) is diagonal R,ij ⇒ i i i Var e (t, U ) U , θ << Var e (t, U ) U , θ = • { R,i i | i i} { M,i i | i i} ⇒ measurement error is dominant source Simplifications should be justifiable in the context at hand •

Note: All of these considerations apply to any mixed effects model formulation, not just nonlinear ones!

50 Model formulation

2 2 Routine assumption: Vi(U i, θi, α)= σe Ini α = σe Often made by “default ” with little consideration of the • assumptions it implies ! Assumes autocorrelation among e negligible • R,ij Assumes constant , i.e., Var e (t, U ) U , θ = σ2 • { R,i i | i i} R and Var e (t, U ) U , θ = σ2 = σ2 = σ2 + σ2 { M,i i | i i} M ⇒ e R M If measurement error is negligible = σ2 = σ2 • ⇒ e R If Var e (t, U ) U , θ << Var e (t, U ) U , θ • { R,i i | i i} { M,i i | i i} = σ2 σ2 ⇒ e ≈ M

51 Model formulation

Standard assumptions in PK: Sampling times are sufficiently far apart that autocorrelation among • eR,ij negligible (not always justifiable!) Measurement error dominates realization error so that • Var(e U , θ ) << Var(e U , θ ) R,ij | i i M,ij | i i (often reasonable ) Measurement error depends on level, approximated by • Var(e U , θ )= σ2 m(t , U , θ ) 2ζ M,ij | i i M { ij i i }

so that Vi(U i, θi, α)= VM,i(U i, θi, αM ) is diagonal with these elements (almost always the case) Often, ζ 1 • ≈

52 Model formulation

Distributional assumption: Specification for E(Y U , θ )= m (U , θ ), • i | i i i i i ′ m (U , θ )= m(t , U , θ ),...,m(t , U , θ ) (n 1) i i i { i1 i i ini i i } i × Specification for Cov(Y U , θ )= V (U , θ , α) • i | i i i i i Standard assumption : Distribution of Y given U and θ is • i i i multivariate normal with these moments Alternatively, model on the log scale = Y are conditionally (on • ⇒ ij U i and θi) lognormal In what follows : Y denotes the outcome on the original or • ij transformed scale as appropriate

53 Model formulation

′ ′ ′ Summary of the two-stage model: Recall Xi =(U i, Ai) Substitute population model for θ in individual-level model • i Stage 1 – Individual-level model : • E(Y X , b ) = E(Y U , θ )= m (U , θ ) = m (X , β, b ), i | i i i | i i i i i i i i Cov(Y X , b ) = Cov(Y U , θ )= V (U , θ , α) = V (X , β, b , α) i | i i i | i i i i i i i i Stage 2 – Population model : • θ = d(A , β, b ), b (0, G) i i i i ∼ Standard assumptions : • – Y i given Xi and bi multivariate normal (perhaps transformed ) – b N(0, G) i ∼ – All of these can be relaxed

54 Model interpretation and inferential objectives

“Subject-specific” model: Individual behavior modeled explicitly at

Stage 1, depending on individual-specific parameters θi that have scientifically meaningful interpretation Models for E(Y U , θ ) and θ , and hence E(Y X , b ), • i | i i i i | i i specified ...... in contrast to a “population-averaged ”model, where a model for • E(Y X ) is specified directly i | i This is consistent with the inferential objectives • Interest is in “typical ” values of θ and how they vary in the • i population...... not in the “typical ” response pattern described by a • “population-averaged ”model for E(Y X ) i | i = E(Y X )= E(Y X , b ) p(b ; G) db • ⇒ i | i i | i i i i R 55 Model interpretation and inferential objectives

Main inferential objectives: May be formalized in terms of the model For a specific population model d, the fixed effect β characterizes • the or (“typical ”) value of θi in the population

(perhaps for individuals with given value of Ai) = Determining an appropriate population model d(A , β, b ) and • ⇒ i i inference on elements of β in it is of central interest Variation of θ across individuals beyond that attributable to • i systematic associations with among-individual covariates Ai is described by G (“unexplained variation ”) = Inference on G is of interest (in particular, diagonal elements ) • ⇒

56 Model interpretation and inferential objectives

Additional inferential objectives: In some contexts Inference on θ and/or m(t , U , θ ) at some specific time t for • i 0 i i 0 i = 1,...,N or for future individuals is of interest Example : “Individualized ” dosing in PK • The model is a natural framework for “borrowing strength ” across • similar individuals (more later)

57 Inferential approaches

′ ′ ′ Reminder – summary of the two-stage model: Xi =(U i, Ai) Stage 1 – Individual-level model : • E(Y X , b ) = E(Y U , θ )= m (U , θ )= m (X , β, b ), i | i i i | i i i i i i i i Cov(Y X , b ) = Cov(Y U , θ )= V (U , θ , α)= V (X , β, b , α) i | i i i | i i i i i i i i Stage 2 – Population model : • θ = d(A , β, b ), b (0, G) i i i i ∼ Standard assumptions : • – Y i given Xi and bi multivariate normal (perhaps transformed ) = probability density function f (y x , b ; β, α) ⇒ i i | i i – b N(0, G) = density f(b ; G) i ∼ ⇒ i Observed data : (Y , X ), i = 1,...,N = (Y , X), • { i i } (Y i, Xi) assumed independent across i

58 Inferential approaches

Natural basis for inference on β, G: Maximum likelihood Joint density of Y given X (by independence ) • N ′ ′ ′ f(y x; γ, G)= f (y x ; γ, G), γ =(β , α ) | i i | i i=1 Y f (y , b x ; γ, G)= f (y x , b ; γ)f(b ; G) • i i i | i i i | i i i Log-likelihood for (γ, G) •

N ℓ(γ, G) = log f (y x ; γ, G) i i | i (i=1 ) Y N = log f (y x , b ; γ) f(b ; G) db i i | i i i i (i=1 ) Y Z Involves N q dimensional integrals • −

59 Inferential approaches

N ℓ(γ, G) = log f (y x , b ; γ) f(b ; G) db i i | i i i i (i=1 ) Y Z Major practical issue: These integrals are analytically intractable in general and may be high-dimensional Some of approximation of the integrals required • Analytical approximation (the approach used historically , first by • pharmacokineticists) – will discuss first Numerical approximation (more recent, as computational resources • have improved)

60 Inferential approaches

Inference based on individual estimates: If n r, can (in principle) i ≥ obtain individual regression estimates θi E.g., if V (U , θ , α)= σ2I can use ordinary • i i i e ni b for each i For fancier V (U , θ , α) can use generalized (weighted ) least • i i i squares for each i with an estimate of α substituted α can be estimated by “pooling ” residuals across all N individuals • Realistically : Require n >> r • i Described in Chapter 5 of Davidian and Giltinan (1995) •

Idea: Use the θi, i = 1,...,N, as “data ” to estimate β and G...

b

61 Inferential approaches

Idea: Use the θi, i = 1,...,N, as “data ” to estimate β and G Consider linear population model θ = A β + B b • b i i i i Standard large-n asymptotic theory = • i ⇒ · θ U , θ N(θ , C ), C depends on θ , α i | i i ∼ i i i i · Estimate Cb by substituting θ , α = θ U , θ N(θ , C ) and • i i ⇒ i | i i ∼ i i treat Ci as fixed b b ·b b Write as θ θ + e∗, e∗ U , θ N(0, C ) • b i ≈ i i i | i i ∼ i = Approximate “linear mixed effects model ” for “outcome ” θ • ⇒ b b i ∗ ∗ · θ A β + B b + e , b N(0, G), e U , θ N(0, C ) i ≈ i i i i i ∼ i | i i ∼ b i Canb be fitted (estimate β, G) using standard linear mixed modelb • methods (treating Ci as fixed )

b 62 Inferential approaches

∗ ∗ · θ A β + B b + e , b N(0, G), e U , θ N(0, C ) i ≈ i i i i i ∼ i | i i ∼ i Fittingb the “linear mixed model”: b “Global two-stage algorithm ” (GTS ): Fit using the EM algorithm ; • see Davidian and Giltinan (1995, Chapter 5) Use standard linear mixed model software such as SAS proc • mixed , R function lme – requires some tweaking to handle the fact

that Ci is regarded as known Appeal to usual large-N asymptotic theory for the “linear mixed • b model ” to obtain standard errors for elements of β, confidence intervals for elements of β, etc (generally works well ) b Common misconception: This method is often portrayed in the literature as having no relationship to the nonlinear mixed effects model

63 Inferential approaches

How does this approximate the integrals? Not readily apparent May view the θ as approximate “sufficient statistics ” for the θ • i i Change of variables in the integrals and replace f (y x , b ; γ) by • b i i | i i the (normal) density f(θ U , θ ; α) corresponding to the i | i i asymptotic approximation b Remarks: When all n are sufficiently large to justify the asymptotic • i approximation (e.g., intensive PK studies), I like this method! Easy to explain to collaborators • Gives similar answers to other analytical approximation methods • (coming up) Drawback : No standard software (although see my website for • R/SAS code)

64 Inferential approaches

In many settings: “Rich ” individual data not available for all i

(e.g., population PK studies); i.e., ni “not large ” for some or all i Approximate the integrals more directly by approximating • f (y x ; γ, G) i i | i Write model with normality assumptions at both stages:

Y = m (X , β, b )+ V 1/2(X , β, b , α) ǫ , b N(0, G) i i i i i i i i i ∼ V 1/2 (n n ) such that V 1/2(V 1/2)′ = V • i i × i i i i ǫ X , b N(0,I ) (n 1) • i | i i ∼ ni i × ∗ First-order Taylor series about bi = bi “close” to bi, ignoring • ∗ cross-product (b b )ǫ as negligible = i − i i ⇒ ∗ ∗ ∗ ∗ ∗ Y m (X , β, b ) Z (X , β, b )b +Z (X , β, b )b +V 1/2(X , β, b , α) ǫ i ≈ i i i − i i i i i i i i i i i i ∗ Zi(Xi, β, b )= ∂/∂bi mi(Xi, β, bi) b b∗ i { }| i = i

65 Inferential approaches

∗ ∗ ∗ ∗ ∗ Y m (X , β, b ) Z (X , β, b )b +Z (X , β, b )b +V 1/2(X , β, b , α) ǫ i ≈ i i i − i i i i i i i i i i i i ∗ 0 “First-order” method: Take bi = (mean of bi) = Distribution of Y given X approximately normal with • ⇒ i i

E(Y i Xi) mi(Xi, β, 0), | ≈ ′ Cov(Y X ) Z (X , β, 0) GZ (X , β, 0)+ V (X , β, 0, α) i | i ≈ i i i i i i = Approximate f (y x ; γ, G) by a normal density with these • ⇒ i i | i moments, so that ℓ(γ, G) is in a closed form = Estimate (β, α, G) by maximum likelihood – because integrals • ⇒ are eliminated, is a direct optimization (but still very messy...) First proposed by Beal and Sheiner in early 1980s in the context of • population PK

66 Inferential approaches

“First-order” method: Software fo method in the Fortran package nonmem (widely used by PKists) • SAS proc nlmixed using the method=firo option •

Alternative implementation: View as an approximate “population-averaged ” model for mean and covariance

E(Y i Xi) mi(Xi, β, 0), | ≈ ′ Cov(Y X ) Z (X , β, 0) GZ (X , β, 0)+ V (X , β, 0, α) i | i ≈ i i i i i i = Estimate (β, α, G) by solving a set of generalized estimating • ⇒ equations (GEEs; specifically, “GEE-1 ”) Is a different method from maximum likelihood (“GEE-2 ”) • Software : SAS macro nlinmix with expand=zero •

67 Inferential approaches

Problem: These approximate moments are clearly poor approximations to the true moments In particular, poor approximation to E(Y X ) = biased • i | i ⇒ estimators for β “First-order conditional methods”: Use a “better ” approximation ∗ Take b “closer ”to b • i i Natural choice: b = of the posterior density • i f (y x , b ; γ) f(b ; G) f(b by , x ; γ, G)= i i | i i i i | i i f (y x ; γ, G) i i | i = Approximate moments • ⇒

E(Y i Xi) mi(Xi, β, bi) Zi(Xi, β, bi)bi | ≈ − ′ Cov(Y i Xi) Zi(Xi, β, bi) GZ (Xi, β, bi)+ Vi(Xi, β, bi, α) | ≈ b i b b

68 b b b Inferential approaches

Fitting algorithms: Iterate between

(i) Update bi, i = 1,...,N, by maximizing the posterior density (or approximation to it) with γ and G substituted and held fixed b (ii) Hold the b fixed and update estimation of γ and G by either i b b (a) Maximizing the approximate normal log-likelihood based on b treating Y i given Xi as normal with these moments, OR (b) Solving a corresponding set of GEEs Usually “converges ” (although no guarantee ) •

Software: nonmem with foce option implements (ii)(a) • R function nlme, SAS macro nlinmix with expand=blup option • implement (ii)(b)

69 Inferential approaches

Standard errors, etc: For both “first-order ” approximations Pretend that the approximate moments are exact and use the usual • large-N asymptotic theory for maximum likelihood or GEEs Provides reliable inferences in problems where N is reasonably large • and the magnitude of among-individual variation is not huge My experience: Even without the integration, these are nasty computational • problems , and good starting values for the parameters are required (may have to try several sets of starting values). The “first-order ” approximation is too crude and should be avoided • in general (although can be a good way to get reasonable starting values for other methods) The “first-order conditional ” methods often work well, are • numerically well-behaved , and yield reliable inferences

70 Inferential approaches

N ℓ(γ, G) = log f (y x , b ; γ) f(b ; G) db i i | i i i i (i=1 ) Y Z Numerical approximation methods: Approximate the integrals using deterministic or stochastic numerical integration techniques (q-dimensional numerical integration ) and maximize the log-likelihood Issue : For each iteration of the likelihood optimization algorithm, • must approximate N q-dimensional integrals Infeasible until recently: Numerical integration embedded repeatedly • in an optimization routine is computationally intensive Gets worse with larger q (the “curse of dimensionality ”) •

71 Inferential approaches

Deterministic techniques: Normality of b = Gauss-Hermite quadrature • i ⇒ Quadrature rule : Approximate an integral by a suitable weighted • average of the integrand evaluated at a q-dimensional grid of values = accuracy increases with more grid points, but so does ⇒ computational burden Adaptive Gaussian quadrature : “Center ” and “scale ” the grid • about b = can greatly reduce the number of grid points needed i ⇒ Software: bSAS proc nlmixed Adaptive Gaussian quadrature : The default • Gaussian quadrature : method=gauss noad •

72 Inferential approaches

N ℓ(γ, G) = log f (y x , b ; γ) f(b ; G) db i i | i i i i (i=1 ) Y Z Stochastic techniques: “Brute force ” Monte Carlo integration: Represent integral for i by • B − B 1 f (y x , b(b); γ), i i | i Xb=1 b(b) are draws from N(0, G) (at the current estimates of γ, G) Can require very large B for acceptable accuracy (inefficient ) • Importance sampling : Replace this by a suitably weighted version • that is more efficient Software: SAS proc nlmixed implements importance sampling (method=isamp)

73 Inferential approaches

My experience with SAS proc nlmixed: Good starting values are essential (may have to try many sets) – • starting values are required for all of β, G, α Could obtain starting values from an analytical approximation • method Practically speaking, quadrature is infeasible for q > 2 almost always • with the mechanism-based nonlinear models in PK and other applications

74 Inferential approaches

Other methods: Maximize the log-likelihood via an EM algorithm For nonlinear mixed models , the conditional expectation in the • E-step is not available in a closed form Monte Carlo EM algorithm : Approximate the E-step by ordinary • Monte Carlo integration EM algorithm : Approximate the E-step by • Monte Carlo simulation and stochastic approximation Software : MONOLIX (http://www.monolix.org/) •

75 Inferential approaches

Bayesian inference : Natural approach to hierarchical models

Big picture: In the Bayesian View β, α, G, and b , i = 1,...,N, as random parameters (on • i equal footing) with prior distributions (priors for bi, i = 1,...,N, are N(0, G)) on β and G is based on their posterior • distributions The posterior distributions involve high-dimensional integration and • cannot be derived analytically ...... but samples from the posterior distributions can be obtained via • Markov chain Monte Carlo (MCMC)

76 Inferential approaches

Bayesian hierarchy: Stage 1 – Individual-level model : Assume normality • E(Y X , b ) = E(Y U , θ )= m (U , θ )= m (X , β, b ), i | i i i | i i i i i i i i Cov(Y X , b ) = Cov(Y U , θ )= V (U , θ , α)= V (X , β, b , α) i | i i i | i i i i i i i i Stage 2 – Population model : θ = d(A , β, b ), b N(0, G) • i i i i ∼ Stage 3 – Hyperprior : (β, α, G) f(β, α, G)= f(β)f(α)g(G) • ∼ Joint posterior density • N f (y x , b ; γ) f(b ; G)f(β, α, G) f(γ,G, b y, x)= i=1 i i | i i i ; | f(y x) Q | denominator is numerator integrated wrt (γ,G, bi, i = 1,...,N) E.g., posterior for β, f(β y, x): Integrate out α, G, b , i = 1,...,N • | i

77 Inferential approaches

Estimator for β: Mode of posterior Uncertainty measured by spread of f(β y, x) • | Similarly for α, G, and b , i = 1,...,N • i Implementation: By simulation via MCMC Samples from the full conditional distributions (eventually) behave • like samples from the posterior distributions The mode and measures of uncertainty may be calculated • empirically from these samples Issue : Sampling from some of the full conditionals is not entirely • straightforward because of nonlinearity of m in θi and hence bi = “All-purpose ” software not available in general, but has been • ⇒ implemented for popular m in add-ons to WinBUGS (e.g., PKBugs)

78 Inferential approaches

Experience: With weak hyperpriors and “good ” data, inferences are very similar • to those based on maximum likelihood and first-order conditional methods Convergence of the chain must be monitored carefully; “false • convergence ” can happen Advantage of Bayesian framework : Natural mechanism to • incorporate known constraints and prior scientific knowledge

79 Inferential approaches

Inference on individuals: Follows naturally from a Bayesian perspective Goal : “Estimate ” b or θ for a randomly chosen individual i from • i i the population “Borrowing strength ”: Individuals sharing common characteristics • can enhance inference = Natural “estimator” is the mode of the posterior f(b y, x) or • ⇒ i | f(θ y, x) i | Frequentist perspective : (γ, G) are fixed – relevant posterior is • f (y x , b ; γ) f(b ; G) f(b y , x ; γ, G)= i i | i i i i | i i f (y x ; γ, G) i i | i = substitute estimates for (γ, G) ⇒ θ = d(A , β, b ) • i i i “Empirical Bayes ” • b b b

80 Inferential approaches

Selecting the population model d: Everything so far is predicated on a fixed d(Ai, β, bi) A key objective in many analyses (e.g., population PK) is to identify • an appropriate d(Ai, β, bi) Must identify elements of A to include in each component of • i d(Ai, β, bi) and the functional form of each component Likelihood inference : Use nested hypothesis tests or information • criteria (AIC, BIC, etc) Challenging when A is high-dimensional... • i ...Need a way of selecting among large number of variables and • functional forms in each component (still an open problem...)

81 Inferential approaches

Selecting the population model d: Continued Graphical methods : Based on Bayes or empirical Bayes “estimates” • – Fit an initial population model with no covariates (elements of

Ai and obtain B/EB estimates bi, i = 1 ...,N

– Plot components of bi against elements of Ai, look for b relationships b – Postulate and fit an updated population model d incorporating

relationships and obtain updated B/EB estimates bi and re-plot – If model is adequate, plots should show haphazard scatter ; b otherwise, repeat – Issue 1 : “Shrinkage ” of B/EB estimates could obscure

relationships (especially if bi really aren’t normally distributed ) – Issue 2 : “One-at-a-time ” assessment of relationships could miss important features

82 Inferential approaches

Normality of b : The assumption b N(0, G) is standard in mixed i i ∼ effects model analysis; however Is it always realistic ? • Unmeasured binary among-individual covariate systematically • associated with θ = b has bimodal distribution i ⇒ i Or a normal distribution may just not be the best model! • (heavy tails , ...) Consequences ? • Relaxing the normality assumption: Represent the density of bi by a flexible form Estimate the density along with the model parameters • = Insight into possible omitted covariates • ⇒

83 Applications

Example 1: A basic analysis – argatroban PK N = 37, different infusion rates D for t = 240 min • i inf Y PK = concentrations at t = 30,60,90,115,160,200,240, • ij ij 245,250,260,275,295,320,360 min (ni = 14), One compartment model • ∗ ∗ Cli Cli PK Di e e m (t, U i, θi)= ∗ exp V ∗ (t tinf)+ exp V ∗ t eCli − e i − − − e i      PK ∗ ∗ ′ θi =(Cli , Vi ) , U i =(Di, tinf) x = 0 if x 0 and x = x if x> 0 + ≤ + Cl∗ = log(Cl ), V ∗ = log(V ) • i i i i No among-individual covariates A • i

84 Applications

Profiles for subjects receiving 1.0 and 4.5 µg/kg-min:

Infusion rate 1.0 µg/kg/min Infustion rate 4.5 µg/kg/min Argatroban Concentration (ng/ml) Argatroban Concentration (ng/ml) Argatroban Concentration 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200

0 100 200 300 0 100 200 300

Time (min) Time (min)

85 Applications

Nonlinear mixed model: Stage 1 – Individual-level model : Y normal with • ij E(Y U , θ )= mPK (t , U , θ ) ij | i i ij i i Cov(Y U , A )= V (U , θ , α)= σ2 diag ...,mPK (t , U , θ )2ζ,... i | i i i i i e { ij i i } = negligible autocorrelation , measurement error dominates ⇒ Stage 2 – Population model • ′ θ = β + b , β =(β , β ) , b N(0, G) i i 1 2 i ∼ = β , β represent population means of log clearance, volume; ⇒ 1 2 equivalently, exp(β1), exp(β2) are population

= √G , √G coefficients of variation of clearance, volume ⇒ 11 22 ≈

86 Applications

Implementation: Using Individual estimates θ found using “pooled ” generalized least • i squares including estimation of ζ (customized R code) followed by fitting the “linear mixedb model ” (SAS proc mixed) First-order method via version 8.01 of SAS macro nlinmix with • expand=zero – fix ζ = 0.22 (estimate from above) First-order conditional method via version 8.01 of SAS macro • nlinmix with expand=eblup – fix ζ = 0.22 First-order conditional method via R function nlme (estimate ζ) • Maximum likelihood via SAS proc nlmixed with adaptive Gaussian • quadrature (estimate ζ) Code: Can be found at http://www.stat.ncsu.edu/people/davidian/courses/st762/

87 Applications

Method β1 β2 σe ζ G11 G12 G22 Indiv. est. −5.433 −1.927 23.47 0.22 0.137 6.06 6.17 (0.062) (0.026) First-order −5.490 −1.828 26.45 – 0.158 −3.08 16.76 nlinmix (0.066) (0.034) First-order cond. −5.432 −1.926 23.43 – 0.138 5.67 4.76 nlinmix (0.062) (0.026) First-order cond. −5.433 −1.918 20.42 0.24 0.138 6.73 4.56 nlme (0.063) (0.025) ML −5.423 −1.897 11.15 0.34 0.150 13.71 11.26 nlmixed (0.064) (0.038) 3 Values for G12, G22 are multiplied by 10

88 Applications

Interpretation: Based on the individual estimates results Concentrations measured in ng/ml = 1000 µg/ml • Median argatroban clearance 4.4 µg/ml/kg • ≈ ( exp( 5.43) 1000) ≈ − × Median argatroban volume 145.1 ml/kg = 10 liters for • ≈ ⇒ ≈ a 70 kg subject Assuming Cl , V approximately lognormal • i i – G √0.14 100 37% coefficient of variation for clearance 11 ≈ × ≈ – G √0.00617 100 8% CV for volume 22 ≈ × ≈

89 Applications

Individual inference: Individual estimate (dashed) and empirical Bayes estimate (solid) Argatroban Concentration (ng/ml) Argatroban Concentration (ng/ml) Argatroban 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200

0 100 200 300 0 100 200 300

Time (minutes) Time (minutes)

90 Applications

Example 2: A simple population PK study analysis: phenobarbital World-famous example • N = 59 preterm infants treated with phenobarbital for seizures • n = 1 to 6 concentration measurements per infant, total of 155 • i Among-infant covariates (A ): Birth weight w (kg), 5-minute • i i Apgar score δi = I[Apgar < 5] Multiple intravenous doses : U = (s ,D ), ℓ = 1,...,d • i iℓ iℓ i One-compartment model (principle of superposition ) • Diℓ Cli m(t, U i, θi)= exp (t siℓ) Vi − Vi − ℓ:Xsiℓ

91 Applications

Dosing history and concentrations for one infant: Phenobarbital conc. (mcg/ml) Phenobarbital conc. 0 20 40 60

0 50 100 150 200 250 300 Time (hours)

92 Applications

Nonlinear mixed model: Stage 1 – Individual-level model • E(Y U , θ )= m(t , U , θ ), Cov(Y U , A )= V (U , θ , α)= σ2 I ij | i i ij i i i | i i i i i e ni = negligible autocorrelation , measurement error dominates and ⇒ has constant variance Stage 2 – Population model • – Without among-infant covariates Ai

log Cli = β1 + bi1, log Vi = β2 + bi2

– With among-infant covariates Ai

log Cli = β1 + β3wi + bi1, log Vi = β2 + β4wi + β5δi + bi2

93 Applications

Empirical Bayes estimates vs. covariates: Fit without

......

-0.5 0.0 0.5 1.0 1.5 . . . Volume random effect . . Clearance random effect Clearance random -0.5 0.0 0.5 1.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Birth weight Birth weight -0.5 0.0 0.5 1.0 1.5 Volume random effect Clearance random effect Clearance random Apgar<5 Apgar>=5 -0.5 0.0 0.5 1.0 Apgar<5 Apgar>=5 Apgar score Apgar score

94 Applications

Empirical Bayes estimates vs. covariates: Fit with

...... -0.5 0.0 0.5 . . Volume random effect . -0.2 0.0 0.1 0.2. 0.3 . Clearance random effect Clearance random 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Birth weight Birth weight -0.5 0.0 0.5 Volume random effect -0.2 0.0 0.1 0.2 0.3 Clearance random effect Clearance random Apgar<5 Apgar>=5 Apgar<5 Apgar>=5 Apgar score Apgar score

95 Applications

Relaxing the normality assumption on bi: Represent the density of bi by a flexible form , fit by maximum likelihood

(a) (b)

10

6

8

5

4 6

h

h

3

4

2

2

1 0

0 0.6 0.4 0.4 0.2 0.2 Volume 0 0.5 Volume 0 0.5 -0.2 -0.2 0 0 -0.4 -0.4 -0.6 -0.5 Clearance -0.5 Clearance

(c) (d)

...... Volume Volume ...... -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 -0.5 0.0 0.5 -0.5 0.0 0.5 Clearance Clearance

96 Extensions

Multivariate outcome: More than one type of outcome measured longitudinally on each individual Objectives: Understand the relationships between the outcome • trajectories and the processes underlying them Key example : Pharmacokinetics/pharmacodynamics (PK/PD) as in • argatroban

97 Extensions

Required: A joint model for PK and PD Data : • – PK PK Yij at times tij (PK concentrations) – PD PD Yij at times tij (PD aPTT responses) PK model with θPK =(Cl∗, V ∗)′, U =(D , t ) • i i i i i inf ∗ ∗ Cli Cli PK PK Di e e m (t, U i, θi )= ∗ exp V ∗ (t tinf)+ exp V ∗ t eCli e i − − − e i      PD model with θPD =(E∗ , E∗ , EC∗ )′ • i 0i max,i 50,i ∗ ∗ E E ∗ e max,i e 0i PD PD E0i m (conc, θi )= e + ∗− 1+ eEC50,i /conc

98 Extensions

Concentration-PD response relationship:

Subject 15 Subject 19

aPTT (seconds) 0 20 40 60 80 100 0 20 40 60 80 100

0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500

Subject 28 Subject 33

aPTT (seconds) 0 20 40 60 80 100 0 20 40 60 80 100

0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 Predicted argatroban conc. (ng/ml) Predicted argatroban conc. (ng/ml)

99 Extensions

Result: Assuming measurement error dominates realization variation, so “true ” PK concentration for i at t m(t, U , θPK ) ≈ i i Stage 1 – Individual-level model • PK PK PK PK PK Yij = m (tij , U i, θi )+ eij Y PD = mPD mPK (tPD, U , θPK ), θPD + ePD ij { ij i i i } ij ePK , ePD mutually independent (primarily measurement error ) • ij ij

100 Extensions

PK′ PD′ ′ Full model: Combined outcomes Y i =(Y i , Y i )

PK′ PD′ ′ ∗ ∗ ∗ ∗ ∗ ′ θi =(θi , θi ) =(Cli , Vi , E0i, Emax,i, EC50i)

Stage 1 – Individual-level model • E(Y PK U , θ )= mPK (tPK , U , θPK ) ij | i i ij i i E(Y PD U , θ )= mPD mPK (tPD, U , θPK ), θPD ij | i i { ij i i i } Cov(Y U , θ )= block diag V PK (U , θ , αPK ), V PD(U , θ , αPD) i| i i { i i i i i i } P K V PK (U , θ , αPK )= σ2 diag ...,mPK(tPK , U , θPK )2ζ ,... i i i e,PK { ij i i } PD V PD(U , θ , αPD)= σ2 diag ...,mPD mPK (tPD, U , θPK ), θPD 2ζ ,... i i i e,PD { ij i i i } Stage 2 – Populationh model i • ′ θ = β + b , β =(β ,...,β ) , b N(0, G) i i 1 5 i ∼

101 Extensions

Implementation: PK Sequentially : Fit PK model = individual estimates θi and • ⇒ PK PK PD predicted concentrations m (t , U i, θ ) at the PD sampling ij i b times b Sequentially : Substitute in PD model and treat as known , fit PD • model Jointly : Fit full model directly •

102 Extensions

Fits: Using nlme

Sequential Joint

β1 −5.433 (0.062) −5.439 (0.061) β2 −1.918 (0.025) −1.894 (0.024) β3 3.361 (0.018) 3.360 (0.019) β4 4.410 (0.045) 4.427 (0.045) β5 6.295 (0.150) 6.362 (0.155) PK PK (σe ,ζ ) (20.42,0.24) (16.66,0.28) P D P D (σe ,ζ ) (0.164,0.80) (0.154,0.82) G11 0.138 0.133 G22 4.56 5.51 G33 8.16 8.51 G44 0.025 0.026 G55 0.377 0.425

3 Values for G22, G33 are multiplied by 10

103 Extensions

Time-dependent among-individual covariates: Among-individual covariates change over time within an individual In principle , one could write θ for each t ; however ... • ij ij Key issue : Does this make scientific sense ? • PK : Do pharmacokinetic processes vary within an individual? • Example: Quinidine study Creatinine clearance, α -acid glycoprotein concentration, etc, • 1 change over dosing intervals How to incorporate dependence of Cl , V on α -acid glycoprotein • i i 1 concentration?

104 Extensions

Data for a representative subject:

time conc. dose age weight creat. glyco. (hours) (mg/L) (mg) (years) (kg) (ml/min) (mg/dl) 0.00 – 166 75 108 > 50 69 6.00 – 166 75 108 > 50 69 11.00 – 166 75 108 > 50 69 17.00 – 166 75 108 > 50 69 23.00 – 166 75 108 > 50 69 27.67 0.7 – 75 108 > 50 69 29.00 – 166 75 108 > 50 94 35.00 – 166 75 108 > 50 94 41.00 – 166 75 108 > 50 94 47.00 – 166 75 108 > 50 94 53.00 – 166 75 108 > 50 94 65.00 – 166 75 108 > 50 94 71.00 – 166 75 108 > 50 94 77.00 0.4 – 75 108 > 50 94 161.00 – 166 75 108 > 50 88 168.75 0.6 – 75 108 > 50 88

height=72 inches, Caucasian, smoker, no ethanol abuse, no CHF

105 Extensions

Population model: Standard approach in PK For subject i: α -acid glycoprotein concentration likely measured • 1 intermittently at times 0, 29, 161 hours and assumed constant over the intervals (0,29), (29,77), (161, ) hours · For intervals I , k = 1,...,a (a = 3 here), A = among-individual • k ik covariates for t I = e.g., ij ∈ k ⇒

θij = Aikβ + bi

This population model assumes “within subject inter-interval • variation ” entirely “explained ” by changes in covariate values Alternatively : Nested random effects •

θij = Aikβ + bi + bik, bi, bik independent

106 Extensions

Multi-level models: More generally Nesting : E.g., outcomes Y , j = 1,...,n , on several trees • ikj ik (k = 1,...,vi) within each of several plots (i = 1,...,N)

θik = Aikβ + bi + bik, bi, bik independent

Missing/mismeasured covariates: Ai, U i, tij

Censored outcome: E.g., due to an assay quantification limit

Semiparametric models: Allow m(t, U i, θi) to depend on an unspecified function g(t, θi) Flexibility , model misspecification • simulation: “Virtual ” subjects simulated from a nonlinear mixed effects model for PK/PD/disease progression linked to a clinical end-point

107 Discussion

Summary: The nonlinear mixed effects model is now a standard statistical • framework in many areas of application Is appropriate when scientific interest focuses on within-individual • mechanisms/processes that can be represented by parameters in a nonlinear (often theoretical ) model for individual time course Free and commercial software is available, but implementation is • still complicated Specification of models and assumptions, particularly the population • model , is somewhat an art-form Current challenge : High-dimensional A (e.g., genomic information) • i Still plenty of methodological research to do •

108 Discussion

See the references on slide 3 for an extensive bibliography

109