Parameter Identifiability and Redundancy: Theoretical Considerations

Mark P. Little1*, Wolfgang F. Heidenreich2, Guangquan Li1 1 Department of Epidemiology and Public Health, Imperial College Faculty of Medicine, St Mary’s Campus, London, United Kingdom, 2 Institut fu¨r Strahlenschutz, Helmholtz Zentrum Mu¨nchen, German Research Center for Environmental Health, Ingolsta¨dter Landstrasse, Neuherberg, Germany

Abstract

Background: Models for complex biological systems may involve a large number of parameters. It may well be that some of these parameters cannot be derived from observed data via regression techniques. Such parameters are said to be unidentifiable, the remaining parameters being identifiable. Closely related to this idea is that of redundancy, that a set of parameters can be expressed in terms of some smaller set. Before data is analysed it is critical to determine which model parameters are identifiable or redundant to avoid ill-defined and poorly convergent regression.

Methodology/Principal Findings: In this paper we outline general considerations on parameter identifiability, and introduce the notion of weak local identifiability and gradient weak local identifiability. These are based on local properties of the likelihood, in particular the rank of the Hessian matrix. We relate these to the notions of parameter identifiability and redundancy previously introduced by Rothenberg (Econometrica 39 (1971) 577–591) and Catchpole and Morgan (Biometrika 84 (1997) 187–196). Within the widely used , parameter irredundancy, local identifiability, gradient weak local identifiability and weak local identifiability are shown to be largely equivalent. We consider applications to a recently developed class of cancer models of Little and Wright (Math Biosciences 183 (2003) 111–134) and Little et al. (J Theoret Biol 254 (2008) 229–238) that generalize a large number of other recently used quasi-biological cancer models.

Conclusions/Significance: We have shown that the previously developed concepts of parameter local identifiability and redundancy are closely related to the apparently weaker properties of weak local identifiability and gradient weak local identifiability—within the widely used exponential family these concepts largely coincide.

Citation: Little MP, Heidenreich WF, Li G (2010) Parameter Identifiability and Redundancy: Theoretical Considerations. PLoS ONE 5(1): e8915. doi:10.1371/ journal.pone.0008915 Editor: Fabio Rapallo, University of East Piedmont, Italy Received November 23, 2009; Accepted January 8, 2010; Published January 27, 2010 Copyright: ß 2010 Little et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was funded partially by the European Commission under contracts FI6R-CT-2003-508842 (RISC-RAD) and FP6-036465 (NOTE). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected]

Introduction Bellu [6], which we shall not consider further.] Catchpole et al. [7] and Gimenez et al. [8] outlined use of computer algebra techniques Models for complex biological systems may involve a large to determine numbers of identifiable parameters in the exponential number of parameters. It may well be that some of these parameters family. Viallefont et al. [9] considered parameter identifiability issues cannot be derived from observed data via regression techniques. in a general setting, and outlined a method based on considering the Such parameters are said to be unidentifiable or non-identifiable, rank of the Hessian for determining identifiable parameters; the remaining parameters being identifiable. Closely related to this however, some of their claimed results are incorrect (as we outline idea is that of redundancy, that a set of parameters can be expressed briefly later). Gimenez et al. [8] used Hessian-based techniques, as in terms of some smaller set. Before data is analysed it is critical to well as a number of purely numerical techniques, for determining determine which model parameters are identifiable or redundant to the number of identifiable parameters. Further general observations avoid ill-defined and poorly convergent regression. on parameter identifiability and its relation to properties of sufficient Identifiability in stochastic models has been considered previously are given by Picci [10], and a more recent review of the in various contexts. Rothenberg [1] and Silvey [2] (pp. 50, 81) literature is given by Paulino and de Braganc¸a Pereira [11]. defined a set of parameters for a model to be identifiable if no two In this paper we outline some general considerations on sets of parameter values yield the same distribution of the data. parameter identifiability. We shall demonstrate that the concepts Catchpole and Morgan [3] considered identifiability and parameter of parameter local identifiability and redundancy are closely redundancy and the relations between them in a general class of related to apparently weaker properties of weak local identifiability (exponential family) models. Rothenberg [1], Jacquez and Perry [4] and gradient weak local identifiability that we introduce in the and Catchpole and Morgan [3] also defined a notion of local Analysis Section. These latter properties relate to the uniqueness of identifiability, which we shall extend in the Analysis Section. [There likelihood maxima and likelihood turning points within the vicinity is also a large literature on identifability in deterministic (rather than of sets of parameter values, and are shown to be based on local stochastic) models, for example the papers of Audoly et al. [5], and properties of the likelihood, in particular the rank of the Hessian

PLoS ONE | www.plosone.org 1 January 2010 | Volume 5 | Issue 1 | e8915 Parameter Identifiability

n matrix. Within the widely-used exponential family we demonstrate N[Qh and data x~ðÞx1,:::,xn [ S such that the log-likelihood Pn that these concepts (local identifiability, redundancy, weak local _ L~LxðÞjh ~ LxðÞl jh is maximized by at most one set of h [ N. identifiability, gradient weak local identifiability) largely coincide. l~1 We briefly consider applications of all these ideas to a recently If L~LxðÞjh is C1 as a function of h[V a set of parameters p developed general class of carcinogenesis models [12,13,14], ðÞhi i~1 [ intðÞV is gradient weakly locally identifiable if there ~ n presenting results that generalize those of Heidenreich [15] and exists0 a neighborhood1 N [Qh and data x ðÞx1,:::,xn [ S such Heidenreich et al. [16] in the context of the two-mutation cancer _ p BLLxjh C _ model [17]. These are outlined in the later parts of the Analysis that @ A ~0 (i.e., h is a turning point of L(xjh)) for at and the Discussion, and in more detail in a companion paper [12]. _ Lh i i~1 _ Analysis most one set of h [ N. Our definitions of identifiability and local identifiability coincide General Considerations on Parameter Identifiability with those of Rothenberg [1], Silvey [2](pp. 50, 81) and Catchpole As outlined in the Introduction, a general criterion for parameter and Morgan [3]. Rothenberg [1] proved that if the Fisher identifiability has been set out by Jacquez and Perry [4]. They information matrix, I~I(h), in a neighborhood of h [ intðÞV is of proposed a simple linearization of the problem, in the context of constant rank and satisfies various other more minor regularity models with normal error. They defined a notion of local conditions, then h [ intðÞV is locally identifiable if and only if I(h) identifiability, which is that in a local region of the parameter is non-singular. Clearly identifiability implies local identifiability, space, there is a unique h0 that fits some specified body of data, n which in turn implies weak local identifiability. By the Mean Value ðÞxi,yi i~1, i.e. for which the model predicted mean hxðÞjh is such Theorem [19](p. 107) gradient weak local identifiability implies that the residual sum of squares: weak local identifiability. Heuristically, (gradient) weak local identifiability happens when: Xn 2 S~ ½yl {hxðÞljh ð1Þ "# Xn Xn Xp 2 ~ LL LL(x jh) LL(x jh ) L L(x jh ) l 1 0~ ~ l ~ l 0 z l 0 :Dh Lh Lh Lh Lh Lh j has a unique minimum. We present here a straightforward i l~1 i l~1 i j~1 i j ð4Þ generalization of this to other error structures. If the model zO(jDhj2), 1ƒiƒp prediction hxðÞ~hxðÞjh for the observed data y is a function of p some vector parameters h~ h then in general it can be j j~1 and in general this system of p equations has a unique solution in p assumed, under the general equivalence of likelihood maximization Dh~ Dh in the neighborhood of h (assumed [ intðÞV ) j j~1 ! 0 and iteratively reweighted least squares for generalized linear p Pn L2L(x jh ) models [18](chapter 2) that one is trying to minimize: whenever l 0 has full rank (= p). This turns out l~1 LhiLhj i, j~1 "# Xn Xp 2 to be (nearly) the case, and will be proved later (Corollary 2). More 1 LhxðÞljh : S~ yl {hxðÞl jh0 { Dhj ð2Þ rigorously, we have the following result. vl Lhj l~1 j~1 h~h0 Theorem 1. Suppose that the log-likelihood L(xjh) is p C2 as a function of the parameter vector h [ V 5R , for all ƒ ƒ § n where ylðÞ1 l n ðÞn p is the observed measurement (e.g., the x~(x1, :::, xn) [ S . numbers of observed cases in the case of binomial or Poisson models) l v ƒlƒn (i) Suppose that for some x and h [ int(V) it is the case that at point and the l ðÞ1 are the current estimates of variance 2 !3 at each point. This has a unique minimum in the perturbing p L2L(xjh) ~ p ~ z T ~ T rk4 5~p Dh Dhj j~1 (h h0 Dh)givenbyH DHDh H Dd,where . Then turning points of the ! LhiLhj n, p i, j~1 LhxðÞjh n ~ { n n, p ~ l likelihood in the neighborhood of h are isolated, i.e., there ðÞdl l~1 ðÞyl hxðÞl jh0 l~1, Hlj l~1, j~1 , Lhj ~ h h0 l~1, j~1 is an open neighborhood N [Qh 5 V for which there is at T p D~diag½1=v1,1=v2,:::,1=vn , whenever H DH has full rank ( = p). _ LL(xjh) most one h [ N that satisfies ~0. _ More generally, suppose that the likelihood associated with Lhi i~1 h~h observation xl is l(xljh) and let LxðÞljh ~ln½lxðÞl jh . Then (ii) Suppose that for some x and h [ int(V) it is the case that 2 !3 generalizing the least squares criterion (1) we now extend the p L2L(xjh) definition of local identifiability to mean that there is at most one rk4 5~p then local maxima of the likeli- maximum of: LhiLhj i, j~1 hood in the neighborhood of h are isolated, i.e., there is an Xn open neighborhood N [Qh 5 V for which there is at most L~LxðÞjh ~ LxðÞjh ð3Þ _ l h[N L(x h) l~1 one that is a local maximum of j . (iii) Suppose that for some x and all h[int(V) it is the case that p 2 !3 p in the neighborhood of any given h [ V5R . More formally: 2 p 4 L L(xjh) 5 Definitions 1. A set of parameters ðÞhi i~1 is identifiable if rk ~rvp then all local maxima of the \ ~ Lh Lh for any h [ V there are no d [ V fhg for which LxðÞjd LxðÞjh i j ~ p i, j 1 ðÞx almost everywhereðÞa:e: . A set of parameters ðÞhi ~ is locally i 1 likelihood in int(V) are not isolated, as indeed are all identifiable if there exists a neighborhood N[Q such that for no p h LL x h \ ~ p ( j ) ~ d [ N fgh is LxðÞjd LxðÞjh ðÞxa:e: . A set of parameters ðÞhi i~1 h [ int(V) for which 0. is weakly locally identifiable if there exists a neighborhood Lhi i~1

PLoS ONE | www.plosone.org 2 January 2010 | Volume 5 | Issue 1 | e8915 Parameter Identifiability 2 ! 3 p 2 _ 4 L L(xjh) 5 _ 0~ h | h p We prove this result in Text S1 Section A. As an immediate that rk §k,whereh ( ) ( p(i))i~kz1 LhiLhj ~ _ 0 consequence we have the following result. i, j 1 h~h n Corollary 1. For a given x~(x1,:::,xn)[ S , a sufficient in the obvious sense. k condition for the likelihood (3) to have at most one maxi- Assume now that the (hp(i))i~1 are strongly maximal. Suppose that ~ p ~ n mum and one turning point in the2 neighborhood! of3 a given for some h1 ðÞh1i i~ [ V and some x (x1,:::,xn) [ S it is the case p 2 !1 3 ! 2 p p L L(x h) 2 2 ~ 4 j 5~ L L(xjh) L L(xjh) h (h1,:::,hp) [ int(V) is that rk p.In 4 5w LhiLhj that rk k.Because i, j~1 LhiLhj LhiLhj i, j~1 ~ i, j~1 ~ particular, if this condition is satisfied h is gradient weakly h h1 h h1 is symmetric, there is a permutation p0 : f1,:::,pg?f1,:::,pg for 2 ! 3 locally identifiable (and therefore weakly locally identifiable). kz1 p 2 (V5R is the parameter space.) 4 L L(xjh) 5 which rk ~kz1 [20](p. 79). If L is C2 That this condition is not necessary is seen by consideration Lhp0(i)Lhp0(i) i, j~1 Pp h~h1 ~ : { { 4 of the likelihood l(xjh) C exp ½xi hi , where C is as a function of h this will be the case in some open neighborhood i~1 0 kz1 2 N [Q kz1 5R . By Theorem 1 (ii) this implies that the (h1p0(i))i~1 L L(xjh) 2 z chosen so that this has unit mass. Then ~{12:½x {h :d k 1 0 i i ij parameters hp0(i) have at most one maximum in N ,sothat LhiLhj i~1 which has rank 0 at h~x and a unique maximum there. In k 0 hp(i) i~1 is not a strongly maximal set of parameters in N .With particular, this shows that the result claimed by Viallefont et al. small changes everything above also goes through with ‘‘weakly [9](proposition 2, p. 322) is incorrect. k gradient maximal’’ substituted for ‘‘weakly maximal’’ and ‘‘strongly Definitions 2. A subset of parameters h (for some per- p(i) i~1 gradient maximal’’ substituted for ‘‘strongly maximal’’. Therefore we mutation p : f1,2,:::,pg?f1,2,:::,pg)isweakly maximal (respectively p have proved the following result. p weakly gradient maximal)no if for any permissible fixed hp(i) i~kz1 2 p TheoremP 2. Let L(xjh) be C as a function of h [ V5R for (hp(i))i~kz k n (such that V 1~ h : h , ::: ,h , h z , ::: ,h [ V =1) all x [ . k,p p(i) i~1 1 k k 1 p h k p(i) i~1 is weakly locally identifiable (respectively gradient (i) If there is a weakly maximal (respectively weakly gradient weakly locally identifiable) at that point, but that this is not the maximal) subset of k parameters, (hp(1),hp(2),:::,hp(k)) (for case for any larger number of parameters. A subset of parameters some permutation p : f1,2,:::,pg?f1,2,:::,pg), and for p n k fixed (h ) and some x~(x ,:::,x )[S L p p(i) i~kz1 1 n (hp(i)) ~ z hp(i) i~1 is strongly maximal (respectivelystrongly gradient i k 1 p k (xj(hp(i))i~1) has a maximum (respectively turning point) maximal) if for any permissible fixed hp(i) i~kz1 and any 2 3 p !k (hp(i)) k 2 k i~kz1 L L p (x (h ) ) open U5V , hp(i) restricted to the set U is (hp(i)) j p(i) i~1 k,p i~1 on the set of h where rk4 i~kz1 5 is weakly maximal (respectively weakly gradient maximal), i.e., all Lhp(i)Lhp(j) k i, j~1 0 8 2 !3 hp(i) [U are weakly maximal (respectively weakly gradient 2 k k ~ < p i 1 L L(h ) (xj(hp(i)) ~ ) 4 p(i) i~kz1 i 1 5 maximal). maximal then max rk : : Lhp(i)Lhp(j) From this it easily follows that a strongly (gradient) maximal set i, j~1 9 8 2 !3 k = < p of parameters hp(i) i~1 is a fortiori weakly (gradient) maximal at p 2 k (hp(i)) ~ z L L(xjh) k p i k 1 ~ 4 5 (hp(i)) p (hp(i)) ~ [ V k and max rk : 0 i~kz1 i 1 k,p ; : Lh Lh all points hp(i) [Vk,p for any permissible hp(i) ~ z . i j ~ i~1 i k 1 9 i, j 1 Assume now that k of the p hi are a weakly maximal set of parame- = ters. So for some permutation p : f1,2,:::,pg?f1,2,:::,pg and for h [ V §k. p ; p k (hp(i))i~kz1 any permissible fixed (hp(i))i~kz1 and any (hp(i))i~1[ Vk,p 5 (h )p (ii) If there is a strongly maximal (respectively strongly gra- k p(i) i~kz1 R thereisanopenneighborhoodN[Q(h )k 5Vk,p and p(i) i~1 dient maximal) subset of k parameters, (hp(1),hp(2),:::,hp(k)) n k some data x~(x ,:::,x )[ S for which L p xj h is (for2 some permutation3 p : f1,2,:::,pg?f1,2,:::,pg)then 1 n (hp(i)) ~ z p(i) i~1 ! i k 1 k _ 2 maximizedbyatmostonesetof(h )k [N, but that this is not L L(xjh) p(i) i~1 rk4 5ƒk Vh[V. the8 case2 for any larger number of parameters.3 Assume9 that r~ Lhp(i)Lhp(j) ! i, j~1 < 2 k k = L L(h )p (xj(hp(i))i~1) 4 p(i) i~kz1 5 k v All further results in this Section assume that the model is a max rk : (hp(i))i~1[N k.If : Lhp(i)Lhp(j) ; member of the exponential family, so that if the observed data i, j~1 P x~(x )n [ n then the log-likelihood is given by L(xjh)~ 2 ~ l l~1 L is C as a function of h then it follows easily that Vk,r Pn { 8 2 !3 9 xlzl b(zl ) < 2 k k = zc(xl,w) for some functions a(w), b(z), c(x, w). L L(h )p (xj(hp(i))i~1) ~ a(w) k 4 p(i) i~kz1 5~ l 1 :(hp(i))i~1[N : rk r; must ~ p Lhp(i)Lhp(j) We assume that the natural parameters zl zl ðÞhi i~1,zl are i, j~1 p _ functions of the model parameters (hi)i~1 and some auxiliary data be an open non-empty subset of N. By Theorem 1 (iii) any h [ V 0 k,r zl , but that the scaling parameter w is not. Let ml~b (zl)~E½xl, p which maximizes L(h )p in Vk,r cannot be isolated, a contradic- ~ 0 p(i) i~kz1 _ so that ml b (zl½(hi)i~1,zl ). In all that follows we shall assume h[V 2 tion (unless there are no maximizing_ k,r). Therefore, either_ there that the function b(z) is C . The following definition was are2 no maximizing h[Vk,r or for at3 least one h[N introduced by Catchpole and Morgan [3]. ! 2 k k Definition 3. With the above notation, a set of parameters L L(h )p (xj(hp(i)) ~ ) p 6 p(i) i~kz1 i 1 7 rk4 5~k.Thisimplies (hi)i~1[V is parameter redundant for an exponential family model Lh Lh 0 q p(i) p(j) _ if m ~b (z ½(r ) ,z ) can be expressed in terms of some strictly i, j~1 k ~ l l i i~1 l (hp(i))i~1 h

PLoS ONE | www.plosone.org 3 January 2010 | Volume 5 | Issue 1 | e8915 Parameter Identifiability

q v smaller parameter vector (ri)i~1 (q p). Otherwise, the set of Remarks: It should be noted that part (i) of this generalizes part p parameters (hi)i~1 is parameter irredundant or full rank. (i) of Theorem 4 of Catchpole and Morgan [3], who proved that if Catchpole and Morgan [3] proved (their Theorem 1) that a a model is parameter redundant then it is not locally identifiable. set"# of parameters is parameter redundant if and only if However, some components of part (ii) (that being essentially full Lm np rank implies gradient weak local identifiability) is weaker than the l v rk p. They defined full rank models to be other result, proved in part (ii) of Theorem 4 of Catchpole and Lhi l~1, i~1 "# np Morgan [3], namely that if a model is of essentially full rank it is Lml p essentially full rank if rk ~p for every (hi) [ V; locally identifiable. As noted by Catchpole and Morgan [3] "#Lh i~1 i l~1, i~1 (pp. 193–4), there are exponential-family models that are Lm np l ~ p conditionally full rank, but not locally identifiable, so part (iii) is if rk p only for some (hi)i~1 [ V then the Lhi l~1, i~1 about as strong a result as can be hoped for. parameter set is conditionally full rank. They also showed (their From Theorem 3 we deduce the following. Theorem"# 3) that if I~I(h) is the Fisher information matrix then Corollary 2. Let L(xjh) belong to the exponentialP family and p n Lm np be C2 as a function of h [ V 5R for all x [ . Then rk l ~rk½I(h), and that parameter redundancy Lh i l~1, i~1 (i) If for some subset of parameters (h )k and some 2 p(i) i~1 !3 implies lack of local identifiability; indeed their proof of Theorems k 2 2 and 4 showed that there is also lack of weak local identifiability n 4 L L(xjh) 5 0 p x~(x1,:::,xn)[S it is the case that rk (respectively gradient weak local identifiability) for all (hi ) ~ [V Lh Lh P i 1 p(i) p(j) ~ which for some x~(x )n [ n are local maxima (respectively i, j 1 l l~1 ~ turning points) of the likelihood. k then this subset is gradient weakly locally identifiable at p this point. Assume that h~(hi) ~ are an essentially full rank set of i 1 k parameters for the"# model. From the above result for every (ii) If a subset of parameters (hPp(i))i~1 is weakly locally Lm np n ~ p l ~ ~ identifiable and for some x [ this point is a local h (hi)i~1 [ V rk rk(I(h)) p. Therefore, Lhi ~ ~ maximum of the likelihood then it is parameter irredundant,P "#l 1, i 1 n0 i.e., of full rank, so rk½I(h)~k, so that for some x0[ L2L(x h) LL(x h) LL(x h) 2 !3 j ~{ j j ~{ k since E E I(h) is of full L2L(x0jh) LhiLhj Lhi Lhj rk4 5~k. In particular, if this holds for all Lhp(i)Lhp(j) i, j~1 rank and so negative definite, soP by the strong h [ V then parameter irredundancy, local identifiability, we can choose x~(x )n [ n so that the same is true of ()l l~1 gradient weak local identifiability and weak local identifia- 2 Xn 0 2 00 L L(xjh) xl{b (z ) L z b (z ) Lz Lz bility are all equivalent. ~ l l { l l l .Thisimplies LhiLhj ~ a(w) LhiLhj a(w) Lhi Lhj l 1 ( 2 Xn 0 2 Proof. This is an immediate consequence of the remarks after p L L(xjh) xl {b (zl ) L zl that on some N[Qh5R ~ { Definition 1, Corollary 1, Theorem 3 (i) and Theorems 1 and 3 of Lh Lh a(w) Lh Lh 9 i j l~1 i j Catchpole and Morgan [3]. QED. 00 = b (zl ) Lzl Lzl Remarks: (i) By the remarks preceding Theorem 3 the conditions ; is of full rank, and therefore by Corollary 1 of part (i) (that for some x~(x ,:::,x )[Sn it is the case that a(w) Lhi Lhj 2 !3 1 n k 2 ~ p 4 L L(xjh) 5 k h (hi)i~1 is (gradient) weakly locally identifiable. Furthermore, rk ~k) are automatically satisfied if h~(h ) p i i~1 ~ LhiLhj the above argument shows that if h (hi)i~1 are a condi- i, j~1 tionally() full rank set of"# parameters then on the (open) set are an essentially full rank set of parameters for the model. Lm np ~ ~ p l ~ ~ p (ii) Assume the model is constructed from a stochastic cancer Vp h ðÞhi i~1[V : rk p h (hi)i~1 is gra- Lhi l~1, i~1 model embedded in the exponential family, in the sense outlined dient weakly locally identifiable. We have therefore proved: in Text S1 Section B, so that the natural parameters ~ p p Theorem 3. Let L(xjh) belong to the exponentialP family and zl zl ½(hi)i~1,zl are functions of the model parameters (hi)i~1 2 p n n be C as a function of h [ V5R for all x [ . and some auxiliary data (zl)l~1, and the means are given by m ~b0(z ½(h )p ,z )~z :h½(h )p ,y , where h½(h )p ,y is the ~ p l l i i~1 l l i i~1 l i i~1 l (i) If the parameter set h (hi)i~1 is parameter redundant then cancer hazard2 function. In this case, as shown in Text S1 Section3 B, 0 2 it is not locally identifiable, and is not weakly locally ½xl {b (zl )zl L h(h,yl ) 2 Xn 6 00 7 identifiable (respectively gradient weakly locally identifiable)P L L(xjh) 6 a(w)b (zl ) LhiLhj ()7 0 p n n ~ 6 2 7. for all (hi ) ~ [ V which for some x~(xl) ~ [ are Lh Lh 4 z 2 Lh(h,y ) Lh(h,y ) ½b 00(z ) zb 000(z )½x {b0(z ) 5 i 1 l 1 i j l~1 { l l l l l l l local maxima (respectively turning points) of the likelihood. 00 3 a(w) Lhi Lhj ½b (zl ) h~(h )p (ii) If the parameter set i i~1 is of essentially full rank z 2 Lh(h,y ) Lh(h,y ) P 2 { l l l n n L L(xjh) The second term inside the summation then for some x~(x ) [ is of full rank a(w) Lhi Lhj l l~1 1 LhiLhj ()p p 00 2 000 0 and therefore h~(hi ) ~ is gradient weakly locally ½b (z ) zb ((z )½x {b (z ) i 1 l l l l A is a rank 1 matrix and can 00 3 identifiable (and so weakly locally identifiable) for all ½b (zl ) p i, j~1 h~(hi) ~ [ V. i 1 be made small in relation to the first term, e.g., by making z small. (iii) If the parameter set h~(h )p is of conditionally full rank l i i~1 Therefore finding data (x,y,z)~(x ,:::,x ,y ,:::,y ,z ,:::,z )[Sn for ()"# 2 !3 1 n 1 n 1 n then it is gradient weakly locally identifiable on the open set k Lm np L2L(x h) ~ ~ p l ~ 4 j 5~ Vp h ðÞhi i~1 [ V : rk p . which rk k is equivalent to finding data for Lhi ~ ~ Lhp(i)Lhp(j) l 1, i 1 i, j~1

PLoS ONE | www.plosone.org 4 January 2010 | Volume 5 | Issue 1 | e8915 Parameter Identifiability 2 !3 k L2h(h,y ) which rk4 l 5~k, or by the result of Dickson which therefore has rank at most N. Therefore by Corollary 2 there Lhp(i)Lhp(j) i, j~1 can be at most N irredundant parameters, or indeed (gradient) weak 2 !3 p 2 locally identifiable parameters. [A similar argument shows that if L h(h,y ) 2 [20](p. 79) for which rk4 l 5~k. one were to reparameterise (via some invertible C mapping LhiLhj {1 i, j~1 h~f (v)) then the embedded log-likelihood L(xjf (h))~L(xjv) associated with h(f {1(h))~h(v) must also have Fisher information Hessian vs Fisher Information Matrix as a Method of matrix of rank at most N.] By remark (ii) after Corollary 2, to show N (h )p Determining Redundancy and Identifiability in that a subset of cardinality of the parameters i i~1 is (gradient) weak"# locally identifiable parameters, requires that one show that Generalised Linear Models p 2 We, as with Catchpole and Morgan [3], emphasise use of the L h(h,yl) has rank at least N for some (h,yl). This is the Hessian of the likelihood rather than the Fisher informa- LhiLhj i, j~1 tion matrix considered by Rothenberg [1]. In the context of approach adopted in the paper of Little et al. [12]. Pn xl zl{b(zl ) GLMs, we have L(xjh)~ zc(xl ,w) and g(mi)~ l~1 a(w) Discussion Pp 0 ~ g(b (zi)) Aijhj for some link function g() and fixed matrix In this paper we have introduced the notions of weak local j~1 Lm 1 A. We define D ~ j ~ where G~ identifiability and gradient weak local identifiability, which we have ij 0 ~ T {1 Lhi g ((mj)Aji ðÞA G ij related to the notions of parameter identifiability and redundancy 0 0 0 diagð g (m1),g (m2),:::,g (mn)Þ. Theorem 1 of Catchpole and Morgan previously introduced by Rothenberg [1] and Catchpole and [3] states that a model is parameter irredundant if and Morgan [3]. In particular we have shown that within the LL(xjh) exponential family models parameter irredundancy, local identifia- only if rk½D~p. The score vector is given by Ui~ ~ Lhi bility, gradient weak local identifiability and weak local identifia- Xn Pn ½x {m Lz ½x {m Lm 1 l l l ~ l l l ~ ðÞDD(x{m) where bility are largely equivalent. a(w) Lh b 00(z )a(w) Lh a(w) i l~1 i l~1 l i The slight novelty of our approach is that the notions of weak 1 1 1 local identifiability and gradient weak local identifiability that we D~diag , ,:::, 00 00 00 . The Fisher information is introduce are related much more to the Hessian of the likelihood b (z1) b (z2) b (zn) 1 rather than the Fisher information matrix that was considered by ~ T ~ T ~ therefore given by I(h) EUU½ 2 DDVDD where V Rothenberg [1]. However, in practice, the two approaches are a(w) very similar; Catchpole and Morgan [3] used the Hessian of the Ex{m x {m ½i i j j i, j is the data variance. Theorem 1 of likelihood, as do we, because of its greater analytic tractability. Rothenberg [1] states that a model is locally identifiable if and The use of this approach is motivated by the application, namely only if rk½I(h)~p. As above (Corollary 2 (ii)), heuristically to determine identifiable parameter combinations in a large class parameter irredundancy, local identifiability, gradient weak local of stochastic cancer models, as we outline at the end of the identifiability and weak local identifiability are all equivalent and Analysis Section. In certain applications the Fisher information T occur whenever rk(DDVDD )~rk(D)~p. Clearly evaluating the may be best for estimating the upper bound to the number of T rank of D is generally much easier than that of DDVDD . irredundant parameters, but the Hessian may be best for Catchpole and Morgan [3] demonstrate use of Hessian-based estimating the lower bound of this quantity. methods to estimate parameter redundancy in a class of capture- In the companion paper of Little et al. [12] we consider the recapture models. problem of parameter identifiability in a particular class of However, for certain applications, both the Fisher information stochastic cancer models, those of Little and Wright [13] and and the Hessian must be employed, as we now outline. Assume that Little et al. [14]. These models generalize a large number of other the model is constructed from a stochastic cancer model embedded quasi-biological cancer models, in particular those of Armitage in an exponential family model in the sense outlined in Text S1 and Doll [21], the two-mutation model [17], the generalized Section B. The key to showing that such an embedded model has no multistage model of Little [22], and a recently developed cancer more than N irredundant parameters is to construct (as is done in model of Nowak et al. [23] that incorporates genomic instability. Little et al. [12]) some scalar functions G1(:),G2(:),:::,GN (:) such These and other cancer models are generally embedded in an that the cancer hazard function h(h) can be written as exponential family model in the sense outlined in Text S1 Section h(G1(h),G2(h),:::,GN (h)). Since the cancer model is embedded in B, in particular when cohort data are analysed using Poisson a member of the exponential family (in the sense outlined in Text S1 regression models, e.g., as in Little et al. [13,14,24]. As we show at Section B) the same will be true of the total log-likelihood the end of the Analysis Section, proving (gradient) weak local p L(xjh)~L(xjG1(h),G2(h),:::,GN (h)). By means of the Chain identifiability of a subset of cardinality k of the parameters (hi)i~1 2 XN 2 can be done by showing that for this subset of parameters L L(xjh) L L(xjG1,:::,GN ) LGl LGk 2 !3 ~ z p Rule we obtain 2 LhiLhj LGlLGk Lhi Lhj L h(h,y) l,k~1 rk4 5~k where h is the cancer hazard function. PN 2 LL(xjG1,:::,GN ) L Gl LhiLhj , so that the Fisher information matrix i, j~1 l~1 LGl LhiLhj Little et al. [12] demonstrate (by exhibiting a particular is given by: parameterization) that there is redundancy in the parameteriza- "#"# 2 XN 2 tion for this model: the number of theoretically estimable L L(xjh) L L(xjG ,:::,G ) LG LG I(h)~{E ~{E 1 N l k parameters in the models of Little and Wright [13] and Little h Lh Lh LG LG Lh Lh i j l,k~1 l k i j et al. [14] is at most two less than the number that are theoretically "# ð5Þ available, demonstrating (by Corollary 2) that there can be no XN 2 LGl L L(xjG1,:::,GN ) LGk more than this number of irredundant parameters. Two numerical ~{ E Lh LG LG Lh examples suggest that this bound is sharp – we show that the rank l,k~1 i l k j

PLoS ONE | www.plosone.org 5 January 2010 | Volume 5 | Issue 1 | e8915 Parameter Identifiability 2 !3 p 2 4 L h(h,y) 5 of the Hessian, rk , is two less than the row Acknowledgments LhiLhj i, j~1 The authors are very grateful for the comments of Professor Byron Morgan dimension of this matrix. This result generalizes previously derived on an advanced draft of the paper, also for the detailed and helpful remarks results of Heidenreich and others [15,16] relating to the two- of a referee. mutation model. Author Contributions Supporting Information Conceived and designed the experiments: MPL WFH. Performed the Text S1 experiments: MPL GL. Analyzed the data: MPL GL. Wrote the paper: Found at: doi:10.1371/journal.pone.0008915.s001 (0.33 MB MPL WFH GL. DOC)

References 1. Rothenberg TJ (1971) Identification in parametric models. Econometrica 39: 13. Little MP, Wright EG (2003) A stochastic carcinogenesis model incorporating 577–591. genomic instability fitted to colon cancer data. Math Biosci 183: 111–134. 2. Silvey SD (1975) . London: Chapman and Hall. 191 p. 14. Little MP, Vineis P, Li G (2008) A stochastic carcinogenesis model incorporating 3. Catchpole EA, Morgan BJT (1997) Detecting parameter redundancy. multiple types of genomic instability fitted to colon cancer data. J Theoret Biol Biometrika 84: 187–196. 254: 229–238: 255: 268. 4. Jacquez JA, Perry T (1990) Parameter estimation: local identifiability of 15. Heidenreich WF (1996) On the parameters of the clonal expansion model. parameters. Am J Physiol 258: E727–E736. Radiat Environ Biophys 35: 127–129. 5. Audoly S, D’Angio L, Saccomani MP, Cobelli C (1998) Global identifiability of 16. Heidenreich WF, Luebeck EG, Moolgavkar SH (1997) Some properties of the linear compartmental models – a computer algebra algorithm. IEEE Trans hazard function of the two-mutation clonal expansion model. Risk Anal 17: Biomed Eng 45: 36–47. 391–399. 6. Bellu G, Saccomani MP, Audoly S, D’Angio L (2007) DAISY: a new software 17. Moolgavkar SH, Venzon DJ (1979) Two-event models for carcinogenesis: tool to test global identifiability of biological and physiological systems. incidence curves for childhood and adult tumors. Math Biosci 47: 55–77. Computer Meth Prog Biomed 88: 52–61. 18. McCullagh P, Nelder JA (1989) Generalized linear models (2nd edition). 7. Catchpole EA, Morgan BJT, Viallefont A (2002) Solving problems in parameter London: Chapman and Hall. 511 p. redundancy using computer algebra. J Appl Stat 29: 626–636. 19. Rudin W (1976) Principles of mathematical analysis (3rd edition). Auckland: 8. Gimenez O, Viallefont A, Catchpole EA, Choquet R, Morgan BJT (2004) McGraw Hill. 352 p. Methods for investigating parameter redundancy. Animal Biodiversity Conser- 20. Dickson LE (1926) Modern algebraic theories. Chicago: Sanborn. 273 p. vation 27.1: 1–12. 21. Armitage P, Doll R (1954) The age distribution of cancer and a multi-stage 9. Viallefont A, Lebreton J-D, Reboulet A-M, Gory G (1998) Parameter theory of carcinogenesis. Br J Cancer 8: 1–12. identifiability and in capture-recapture models: a numerical 22. Little MP (1995) Are two mutations sufficient to cause cancer? Some approach. Biometrical J 40: 313–325. generalizations of the two-mutation model of carcinogenesis of Moolgavkar, 10. Picci G (1977) Some connections between the theory of sufficient statistics and Venzon, and Knudson, and of the multistage model of Armitage and Doll. the identifiability problem. SIAM J Appl Math 33: 383–398. Biometrics 51: 1278–1291. 11. Paulino CDM, de Braganc¸a Pereira CA (1994) On identifiability of parametric 23. Nowak MA, Komarova NL, Sengupta A, Jallepalli PV, Shih I-M, et al. (2002) statistical models. J Ital Stat Soc 1: 125–151. (Stat Methods Appl 3). The role of chromosomal instability in tumor initiation. Proc Natl Acad Sci U S A 12. Little MP, Heidenreich WF, Li G (2009) Parameter identifiability and 99: 16226–16231. redundancy in a general class of stochastic carcinogenesis models. PLoS ONE 24. Little MP, Li G (2007) Stochastic modelling of colon cancer: is there a role for 4(12): e8520. genomic instability? Carcinogenesis 28: 479–487.

PLoS ONE | www.plosone.org 6 January 2010 | Volume 5 | Issue 1 | e8915 1 Supplementary material A.

2 Proof of Theorem 1

3 In this Section we outline a proof of Theorem 1 in the main text. To prove this result we need

4 the following lemma of Rudin [19](p.229).

1 5 Lemma A1. Suppose mnr,, are non-negative integers s.t . mr≥ , nr≥ and F is a C

nm 6 function E ⊂→RR where E is an open set. Suppose that rk('()F x )= r ∀∈x E . Fix

n mm 7 aE∈ and put AF= '(a ) , and let YAR1 = () and let PR: → R be a linear projection

2 m n 8 operator ( PP= ) s.t. YP1 = (R) and Y2 = null() P . Then ∃UV, ⊂R , open sets and a

1 1 9 bijective C function HV: →U whose inverse is also C and s.t. FH( (x )) =+Ax ϕ(Ax),

1 10 ∀∈x V where ϕ : AV ⊂→Y1Y2 is a C function.

11 We now restate Theorem 1 here.

2 12 Theorem A2. Suppose that the log-likelihood Lx(|)θ is C as a function of the parameter

p n 13 vector θ ∈Ω⊂ R , and for all xx=∈(1 ,..., xn ) Σ.

p ⎡⎤⎛⎞∂2 Lx(|)θ 14 (i) Suppose that for some x and θ ∈ int(Ω ) it is the case that rk ⎢⎥⎜⎟= p . ⎢⎥⎜⎟∂∂θθ ⎣⎦⎝⎠ijij,1=

15 Then turning points of the likelihood in the neighborhood of θ are isolated, i.e., there is  16 an open neighborhood N ∈ℵθ ⊂ Ω for which there is at most one θ ∈ N that satisfies

p ⎛⎞∂Lx(|)θ 17 ⎜⎟= 0 . ⎝⎠∂θi  i=1 θθ=

p ⎡⎤⎛⎞∂2 Lx(|)θ 18 (ii) Suppose that for some x and θ ∈int(Ω ) it is the case that rk ⎢⎥⎜⎟= p ⎢⎥⎜⎟∂∂θθ ⎣⎦⎝⎠ijij,1=

19 then local maxima of the likelihood in the neighborhood of θ are isolated, i.e., there is an  20 open neighborhood N ∈ℵθ ⊂ Ω for which there is at most one θ ∈ N that is a local

21 maximum of Lx ( |θ ) .

22 (iii) Suppose that for some x and all θ ∈ int(Ω ) it is the case that

p ⎡⎤⎛⎞∂2 Lx(|θ ) 23 rk ⎢⎜⎟⎥=

p ⎛⎞∂Lx(|)θ 24 isolated, as indeed are all θ ∈ int(Ω ) for which ⎜⎟= 0 . ∂θ ⎝⎠i i=1

25 Proof:

⎛⎞∂∂Lx(|)θ Lx(|θθ )∂Lx (|) (i) Let FR: Ω⊂ p→Rp be defined by F(θθ , ,..., θ ) = , ,..., . 26 12 p ⎜⎟ ⎝⎠∂∂θθ12∂θp

p 2 2 1 p ∂F()θ ⎛⎞∂ Lx(|θ ) 27 Since L is CF, is C on int(Ω⊂ ) R . By assumption i = ⎜⎟ is of full ∂∂θθ⎜∂θ⎟ jj⎝⎠iij,1=

p 28 rank at θ . By the inverse function theorem [19](pp.221-223) there are open NM, ⊂ R such

1    29 that θ ∈ N and a C bijective function GM: → N such that GF((θ )) = θ for all θ ∈ N . In   30 particular there can be at most a single θ ∈ N for which F(θ )= 0 . QED.  31 (ii) By (i) there is an open neighborhood N ∈ℵ⊂Ωθ for which if θ ∈ N is such that

p p ⎛⎞∂Lx(|)θ  ⎛⎞∂Lx(|)θ  32 ⎜⎟= 0 then for θθ' ≠∈N ⎜⎟≠ 0 . Suppose now that θ ∈ N is ⎝⎠∂θi  ⎝⎠∂θi i=1 θθ= i=1 θθ= '  33 a local maximum of Lx( |θ ) . Any member of this neighborhood other than θ cannot be a

34 turning point, and so by the Mean Value Theorem (Rudin 1976, p.107) cannot be a local

35 maximum. QED. pp 36 (iii) Let FR: Ω⊂ →R be defined by

⎛⎞∂∂Lx(|θ )Lx (|)θθ ∂ Lx (|) F(,θθ,...,θ ) = , ,..., . Since L is 37 12 p ⎜⎟ ⎝⎠∂∂θθ12 ∂ θp

p ⎡⎤2 2 1 p ⎛⎞∂ Lx(|)θ 38 C , F is C on int(Ω⊂ ) R . By assumption rk() F= rk ⎢⎥⎜⎟= r for all ⎢⎥⎜⎟∂∂θθ ⎣⎦⎝⎠ijij,1=

p 39 θ ∈Ωint( ) ⊂R . Suppose that θ0 ∈Ωint( ) is a local maximum of L . Let

∂F pp pp 40 AR=→: R (AL∈ (,)RR), and choose some arbitrary projection ∂θ θθ= 0

pp pp 41 PL∈ (,RR)s.t. PR()== Y1 AR(), and let Y2 = null() P . By Lemma A1 there are open

p 1 1 42 sets UV, ⊂ R with θ0 ∈⊂U int(Ω ) and a bijective C mapping with C inverse

−−11 1 43 HV: →U s.t. Fy()=+ AH yϕ( AHy), ∀yV∈ where ϕ : AV⊂ Y1→ Y2 is a C

44 function.

45 Since θ0 ∈Ωint( ) is a local maximum of Lx(;)θ , by the Mean Value Theorem [19](p.107)

46 F()θ = 0. Now choose some non-trivial vector k ∈ null() A and define a function, as we can, 0 

p −1 47 on some interval δεε:(,)−→R by δ()tH= (H()θ+ tk). Because HV: → U is bijective 0 

48 and k is non-trivial δ (tt )=⇔δ ( ') t=t'. Also, it is the case that: 

−−11 −−11 Ft(δθϕθ ( ))=+ AHHH ( [ (00 ) tkAHHH ])+ ( ( [ ( ) + tk ])) = 49  (A1) AH[−−−−1111 ()θϕθ++ tk ] ([ AH () + tk ])[ = AH ()]([ θϕθθ + AH ()])()0 = F = 0000 0

50 Define GR:(−→ε ,ε ) by Gt( )== L (δ ( t )) L ((δδ12 ( t ), ( t ),..., δn ( t ))) . By the chain rule

dGp ∂ L(|) x θ dδ 51 [19](p.215) =∑ i =0 ∀t ∈−(,ε ε ). Finally, by the Mean Value Theorem dt i=1 ∂θi dt

52 [19](p.107) G must be constant; in particular Lx(|())(|(0))(|δ t= Lxδθ= Lx 0 ) ∀∈−t (,ε ε )

53 and so all points δ ()t must also be local maxima of Lx( |θ ) . Therefore θ0 is not an isolated 54 local maximum. Since all we used about θ0 ∈int(Ω ) was that

⎛⎞∂∂Lx(|θθ ) Lx (| ) ∂Lx (|θ ) F()0θ = F((θ )p ) ==00, ,..., 00 , the above argument also 55 0 01ii= ⎜⎟ ⎝⎠∂∂θθ01 02 ∂ θ0 p

56 shows that turning points cannot be isolated: Ft(())δ = 0 . QED.

57 58 Supplementary material B.

59 Specification of embedded exponential family model

60 In this Section we outline the specification of an embedding of a stochastic cancer model in a

61 general class of statistical models, the so-called exponential family [18]. This is often done in

62 fitting cancer models to epidemiological and biological data (e.g., see references [12, 13, 14,

63 24]). Recall that a model is a member of the exponential family if the observed data

n nn ⎡ xbllςς− () l ⎤ 64 xx=∈()ll=1 ∑ is such that the log-likelihood is given by Lx(|)θ =+∑ ⎢ cx(,)l φ ⎥ l=1 ⎣ a()φ ⎦

65 for some functions abcx(φ ), (ς ), ( ,φ ) . We assume that the natural parameters

p p 66 ςςθllii= [( )=1 , zl] are functions of the model parameters ()θii=1 and some auxiliary data

n pp p 67 ()zll=1 , and that μςθl=bz'(l [( ii )==11 , l ]) =zhl⋅[(θ ii ) ,y l ] . Here h[(θii )=1 ,y l ] is the cancer hazard

68 function (for example, that of Little et al. [14], as also specified in the main text and in Text

n 69 S1 Section B of Little et al. [12]), ()yll=1 are some further auxiliary data, and we assume that

n 70 the ()zll=1 are all non-zero. [Note: this is not necessarily a generalized (GLM).]

71 In this case it is seen that

2 ⎡['()](,)xbllll−∂ςθ z h y ⎤ ⎢ ⎥ 2 n ab(φς ) ''( ) ∂∂ θθ ∂ Lx(|)θ ⎢ lij ⎥ 72 = (B1) ∑ ⎢ 22⎥ ∂∂θθ l=1 zhyhyb∂∂(θθ , ) ( , )⎧ [ ''( ς )] + b '''( ς )[ xb − '( ς )]⎫ ij ⎢ lllllll⎥ − ⎨ 3 ⎬ ⎣⎢ ab()φθ∂∂ij θ⎩⎭[''()]ςl ⎦⎥

73 so that the Fisher information matrix is given by

⎡⎤2 n 2 ∂ Lx(|)θ 1 zhyhyll∂∂(,θθ ) (,l ) 74 IE()θ ij =− θ ⎢⎥= ∑ (B2) ⎣⎦⎢⎥∂∂θθij ab() φl=1 ''() ςl∂ θ i ∂ θ j