Küchenhoff: An exact algorithm for estimating breakpoints in segmented generalized linear models

Sonderforschungsbereich 386, Paper 27 (1996)

Online unter: http://epub.ub.uni-muenchen.de/

Projektpartner

An exact algorithm for estimating breakp oints

in segmented generalized linear mo dels

Helmut Kuc henho

University of Munich Institute of Statistics Akademiestrasse

D Munc hen

Summary

We consider the problem of estimating the unknown breakp oints in seg

mented generalized linear mo dels Exact algorithms for calculating maxi

mum likeliho o d estimators are derived for dierenttyp es of mo dels After

discussing the case of a GLM with a single covariate having one breakp oint

a new algorithm is presented when further covariates are included in the

mo del The essential idea of this approach is then used for the case of more

than one breakp oint As further extension an algorithm for the situation of

two regressors eachhaving a breakp oint is prop osed These techniques are

applied for analysing the data of the Munichrental table It can b e seen that

these algorithms are easy to handle without to o much computational eort

The algorithms are available as GAUSSprograms

Keywords Breakp oint generalized linear mo del segmented regression

Intro duction

In many practical regressiontyp e problems we cannot t one uniform re

gression function to the data since the functional relationship b etween the

resp onse Y and the regressor X changes at certain p oints of the domain of

X These p oints are usually called breakp oints or changep oints One imp or

tant example is the threshold mo del used in epidemiology see Ulm

Kuchenho and Carroll where the covariate X typically an exp osure

has no inuence on Y eg the o ccurrence of a certain disease up to a cer

tain level Thus the relationship b etween X and Y is describ ed by a constant

up to this level and for values of X greater than this level it is given byan

increasing function

In such situations we apply segmented or multiphase regression mo d

els which are obtained by a piecewise denition of the regression function

E Y jX xonintervals of the domain of X Anoverview concerning this

topic can b e found in Chapter of Seb er and Wild Assuming a gen

eralized linear mo del and a known numb er of segments wehave

G x if x

G x if x

  

E Y jX x

G x if x

K K K

and

y b

cy f y j exp

Here is the nuisanceparameter and b E Y jX x see Fahrmeir

and Tutz Seb er and Wild G is the linkfunction eg logistic

identityetcand f denotes the density function of Y given X xFor the

threshold mo del mentioned ab ove for instance there are two segments where

G is the logistic link and The endp oints of the intervals denote the

i

breakp oints Since they are typically unknown they have to b e estimated

For theoretical but also practical reasons the breakp oints are assumed to ly

between the smallest and the largest sample value x i n

i

We further assume that the regression function is continuous ie

i K

i i i i i i

Thus the mo del can b e stated in another parameterisation

K

X

E Y jX x G x x

i i 

i

t if t

where t



if t

From this representation it can b e seen that is a usual generalized linear

mo del if the breakp oints are known Therefore the MLestimation can

i

be performed by a gridsearchtyp e algorithm in case of two segments see

Stasinop oulos and Rigby

For the linear mo del an exact algorithm for the estimator

was given by Hudson see also Schulze or Hawkins In

Section it is shown that this algorithm also works for the GLM with one

breakp oint In Section the algorithm is extended to mo dels with further

covariates In Section the ideas of Section are used to derive the algorithm

for fairly general mo dels with more than one breakp oint and more than one

co variate with breakp oints Giving an algorithm for such general mo dels

we ll a gap existing so far in the literature In Section an example is

considered Weinvestigate the relationship b etween the net rent of ats in

Munich and the at size as well as the age of the ats based on data of the

Munichrental table Finally problems concerning the computing time are

discussed and some interesting additional asp ects are p ointed out

Exact MLestimation for mo dels with one

breakp oint

We consider a GLM with one breakp oint and density The regression

function can then b e written as

E Y jX xG x x witht t

  

Here is the slop e parameter of the rst segment and is the slop e in



segment

The loglikeliho o d function of one observation conditioned on X isgiven

by

y b

G y x x cy

 

where the nuisance parameter is assumed to b e constantover the segments

If G is the natural link function then

x x

 

Having iid observations x y the loglikeliho od function to

i i in

b e maximized in is



n

X

G y x x

i i  i 

i

Since this function is not dierentiable in at x we rst calculate the prole

i

likeliho o d That is we maximize with resp ect to all other parameters and

get

n

X

P max G y x x

i i  i 



i

Obviously mo del is a GLM for xed Thus the calculation of the prole

likeliho o d corresp onds to the MLestimation of a GLM

Since is continuous in it can b e maximized by a grid search see

Ulm For a GLIMmacro see Stasinop oulos and Rigby Though

these algorithms give reliable results if the grid is appropriately chosen it

would b e desirable to have an exact algorithm at ones disp osal

We derivesuch an exact algorithm for maximizing following the ideas

of Hudson Since wehave assumed that the nuisance parameter is

constant for the two segments it can b e neglected in maximizing Let the

observations x y b e ordered with resp ect to x suchthatx x for

i i in i i j

i j The loglikeliho o d is dierentiable with resp ect to



everywhere except for those values of with x for one ifng

i

Therefore the algorithm has to b e divided into roughly two steps according

to the dierentiability of the loglikeliho o d

In the rst step the p oints of dierentiability ie the case x x

k k 

for some k are considered Denoting the partial derivativeofG with

resp ect to the second argumentby G we therefore get



n n

X X

C B

x

i

C B

G y glp G y glp

 i i i i

A

x

i 

i i

I I



fx g fx g

i i

where glp is the broken linear predictor

glp x x

i i  i 

and I denotes the indicator function

Let b e a zero of with x x and The

 k k  

system of equations obtained by equating to results after some algebra

in

k

X

G y x

 i i

i

k

X

G y x x

 i i i

i

n

X

G y x

 i   i

ik 

n

X

G y x x

 i   i i

ik 

with j

j j

From equations and and we conclude that and

resp ectively are MLsolutions of the regressions in the two seg

 

ments Since is uniquely determined by the continuity condition p ossible

zeros with x x can b e determined by estimating the parameters sep

k k 

and x y arately in the two segments based on x y

k i i ik n i i i

which yields and resp ectively The estimator for is then

 

obtained from





If x x it is a zero of In this case the estimator is given by

k k 

If x x we deduce from that there is no lo cal maximum

k k 

with x x

k k 

The ab ovementioned pro cedure is p erformed for the nite number of

intervals x x where these intervals havetobechosen such that the

k k 

MLestimators exist in the corresp onding segments

In the second step we calculate the prole likeliho o d P x i n

i

obtaining the maximum of the loglikeliho o d at all p oints of nondieren

tiability Finally the global maximum of the loglikeliho o d is given bythe

maximum of this nite n umb er of lo cal maxima

Conducting this algorithm the estimation of at most m m GLMs is

needed if there are m observations with dierentvalues of x

Mo dels with covariates

In many practical situations there will b e further covariates in the regression

mo del which leads to the following extension of mo del

E Y jX x Z z G x x z

 

where Z is the vector of covariates with vector of parameters As in Sec

tion x y z denote the corresp onding observations with x x

i i i in i j

for ij

Toderive the MLestimator we rst consider again the case x x

k k 

Then the derivative of the loglikeliho od with resp ect to



is

C B

n n

x

i

X X

C B

C B

G y glp G y glp x

i i  i i i 

C B

A

i i

I I



fx g fx g

i i

z

i

with

glp x x z

i i  i  i

Analogously to we get the following system of equations with

denoting a zero of



k

X

G y x z

 i i

i

i

k

X

G y x z x

 i i i

i

i

n

X

G y x z

 i   i

i

ik 

n

X

G y x z x

 i   i i

i

ik 

k n

X X

G y x z z G y x z z

 i i i  i   i i

i i

i

ik 

with j

j j

Equations corresp ond to a generalized linear mo del with an

typ e design matrix

x z

B C

B C

B C

B C

x z

k k

B C

D

k

B C

x z

k  k 

B C

B C

A

x z

n n

Therefore we obtain zeros of by tting a generalized linear mo del with

design matrix D which again yields b ecause of the continuity condition

k

 



If x x wehave found a lo cal maximum otherwise there is no max

k k 

imum with x x

k k 

The remaining part of the algorithm is now completely analogous to that

presented in Section

Further extensions

Mo dels with more than twosegments

Let us now consider the case of K segments ie Mo del In practical

problems the numb er of segments will b e typically not greater than three

Williams for instance restricts his investigations to this sp ecial case

But even in the situation of a with a normally distributed Y

no complete algorithm for calculating the exact MLestimator can b e found

in the literature Williams states explicitly that his algorithm for three

segments shows certain gaps We describ e the complete algorithm for K

where it will b e formulated such that it can b e directly extended to the case

of K

We start with the generalized linear mo del with two breakp oints in the

following parametrization

E Y jX xG x x x

    

again denote the with and for i Let x y

in  i i i

ordered observations with x x for ij Then we get the derivativeof

i j

the loglikeliho o d with resp ect to as

  

n

X

G y glp x x x I I

 i i i i  i    

fx g fx g

i i 

i

with glp x x x

i i  i   i  

Since the loglikeliho o d is not dierentiable at p oints with x or x

i  i

for some x the domain of is divided into rectangles R with

i  kl

R x x x x k l

kl k k  l l

In the interior of R the loglikeliho o d is dierentiable Equating to

kl

we get after some algebra

n

X

I G y x x

i  i i

fx g

i

i

n

X

G y x x I I

 i   i i

fx g fx g

i i 

i

n

X

G y x x I

 i      i i

g fx

 i

i

Obviously all maxima of the loglikeliho o d corresp ond to the maximaofthe

separate regressions in the three segments As in the preceding sections we

therefore derive the MLsolutions separately for the dierent segments which

are denoted by The continuity assumption yields

   

  

and



  

If x x and x x then a lo cal maximum has b een found

k k   l l

Otherwise the loglikeliho o d function has its maximum value at the

b oundary of the rectangle ie fx x g or fx x g Let for

k k   l l

instance x then the mo del can b e rewritten intro ducing a new covari

i

ate z x as



x E Y jX x G x z

   

Thus we are in the situation of a mo del with one breakp oint and an additional

covariate as discussed in Section To obtain all maxima at the b oundary

of R wecheck successively all p oints x and x as well as

kl k   l

x

 l

This yields the maximizer for the rectangle R Finally the global

kl kl

maximum is obtained as

arg max L

ML kl

kl

Mo dels with two regressors b oth having a break

p oint

As a further extension weallowfortwo regressors with each ofthemhaving

a breakp ointwhich is mo delled as

E Y jX Z G x x z z

     

Even in this case we can essentially pro ceed as ab ove Let x y z

i i i in

denote the ordered observations with x x for ij The observa

i j

tions of the second regressor Z are ordered by a second index z with

i

r

z z r s For deriving the MLestimator of the parameter vector

i i

r s

we again divide the domain of into rect

   

angles R x x z z and consider rst the case where

kr k k  l l 

r r 

lies in the interior of R Using ideas of Section we t a generalized linear

kr

and design matrix mo del with parameter vector

   

D x I x I z I z I

i i i i

fx g fx g fz g fz g

i i i  i 

in

The corresp onding mo del equation is given by

E Y jX x Z z G xI xI

 

fx g fx g

z I z I

 

fz g fz g

 

From the MLestimator we obtain as estimators

   

for using the continuity assumption



 

and



 

If R wehave found a lo cal maximum of the loglikeliho o d since

 kr

the score equations can b e transformed similarly to equations

At the b oundary of the rectangle the ab ove mo del reduces to one with

only a single regressor one breakp ointandacovariate Thus the approach

of Section can b e applied

This leads to the maximizer for each rectangle R Finally the global

kr kr

maximum results from

arg max L

ML kr

kr

Further extensions to more than twocovariates or more than two breakp oints

are straightforward but the corresp onding algorithms b ecome considerably

more complicated and thus require an increasing computational eort

An example the Munichrental table

Rental tables are built up based on surveys in larger cities or communities in

Germany They serve as a formal instrument for rating rents dep ending on

year of construction at size and other covariates For a detailed description

of the data material and the statistical metho ds used we refer to Fahrmeir

Gieger Mathes and Schneewei and F ahrmeir Gieger and Klinger

As a rst approachwe mo del the relationship b etween net rentY and

at size X where the assumption of a breakp oint is justied b ecause smaller

ats are more exp ensive relative to bigger ats Thus we consider the fol

lowing mo del equation

E Y jX x x x

 

Besides the presence of a breakp oint an additional problem o ccurs when

analysing this data caused by heteroscedasticityFollowing Fahrmeir et al

we apply a weighted regression using the weights prop osed there

The results are obtained from the algorithm presented in Section and

are given in Table where the estimated variances are calculated bythe

asymptotic theory derived in Kuc henho

As it can b e seen from these results the breakp ointtakesavalue of ab out



m For smaller ats the estimated slop e of is b elowtheonefor

bigger ats Comparing our results with those gained from a



linear regression no essential dierences can b e stated regarding the t of the

Table Parameter estimates of the weighted broken linear regression mo del

for the Munich rental table In the column the estimated standard

deviations are listed

parameter estimate



data For a more detailed analysis and a comparison with the results from

Fahrmeir Gieger Mathes and Schneewei see K uchenho

In a second step we takeinto account the age of the at Z as addi

tional regressor which is dened as minus year of construction Possible

changes in the way of building ats are reected in the mo del equation by

allowing for a breakp oint in the second regressor age

E Y jX Z x x z z

     

Here the application of the algorithm prop osed in Section yields the

results given in Table

Table Parameter estimates of the weighted broken linear regression mo del

for the Munichrental table and denote the breakp oints b elonging



to at size and age In the column the estimated standard deviations are

listed

parameter estimate







In this case the application of a segmented mo del yields a much b etter t

than the use of a linear mo del without breakp oints The estimated breakp oint

of for the variable age corresp onds to the year Since is not



signicantly dierent from the age of ats built b efore is not essential

for their net rent For ats built after the net rent increases with the

year of construction As already indicated this eect is p ossibly

due to a substantial change in the wayofhow houses were built

Discussion

From our exp erience when analysing data by applying the ab ove algorithms

as for instance in the example presented in Section we can state a go o d

practicability of these algorithms For the Munich rental table where wehave

ab out data p oints the computing time for the mo del with one break

p ointwas ab out minutes on a sunsparc work station For the mo del

with two regressors eachhaving a breakp oint it to ok ab out minutes for

calculating the MLestimators using the algorithm presented in Section

Thus the computing times do not seem to constitute a real problem In

addition algorithms for linear logistic and weighted linear mo dels can b e

obtained on request from the author The algorithms are written in Gauss

and can easily b e rewritten for other generalized linear mo dels

Finallywe should address two imp ortant asp ects related to the prop osed

algorithms

When deriving the ab ove algorithm we assumed that the nuisance pa

rameter is constantover the segments Note that this assumption fails eg

in the linear regression when there are dierentvariances in the segments

In case of this problem do es not o ccur since there is no

nuisance parameter This case of the logistic regression where can

b e treated as known can b e generalized to other mo dels That means alter

natively to the assumption of a constantnuisance parameter the derivation

of the algorithm remains valid if the nuisance parameter itself or the ratio

of the nuisance parameters of the dierent segments is kno wn

One of the most interesting asp ects of our idea for deriving an exact

algorithm concerns the mo dels with covariates The approach prop osed in

this pap er is not only useful for such mo dels for which this approachwas

designed but it is also an essential part of the derivation of the corresp onding

algorithm for the case of more than two segments Thus wewere also able

to solve the problem p ointed out by Williams

References

Fahrmeir L Gieger C and Klinger A Additive dynamic and

multiplicative regression Discussion Paper SFB Munich

Fahrmeir L Gieger C Mathes H and Schneewei H Gutachten

zur Erstel lung des Mietspiegels fur Munchen Teil B Statistische

Analyse der Nettomieten Institute of Statistics LudwigMaximilians

UniversityofMunich

Fahrmeir L and Tutz G Multivariate statistical model ling basedon

generalized linear models SpringerVerlag New York

Hawkins D M Point estimation of the parameters of piecewise

regression mo dels Applied Statistics

Hudson D J Fitting segmented curves whose join p oints haveto

b e estimated Journal of the American Statistical Association

Kuchenho H Schatzmethoden in mehrphasigen Regressionsmo

del len Habilitationsschrift Institute of Statistics LudwigMaximilians

UniversityofMunich

Kuchenho H and Carroll R J Biases in segmented regression

with errors in predictors Statistics in Medicine to app ear

Schulze U Mehrphasenregression Stabilitatsprufung Schatzung

Hypothesenprufung AkademieVerlag Berlin

Seb er G A F and Wild C J John Wiley

Sons New York

Stasinop oulos D M and Rigby R A Detecting break p oints in

generalised linear mo dels Computational Statistics Data Analysis

Ulm K A statistical metho d for assessing a threshold in epidemio

logical studies Statistics in Medicine

Williams D A Discrimination b etween regression mo dels to de

termine the pattern of enzyme synthesis in synchronous cell cultures

Biometrics