Statistical Applications in Genetics and Molecular Biology

Volume 10, Issue 1 2011 Article 33

Deviance Information Criteria for in Approximate Bayesian Computation

Olivier Francois, University Joseph Fourier Guillaume Laval, Institut Pasteur and Centre National de la Recherche Scientifique

Recommended Citation: Francois, Olivier and Laval, Guillaume (2011) "Deviance Information Criteria for Model Selection in Approximate Bayesian Computation," Statistical Applications in Genetics and Molecular Biology: Vol. 10: Iss. 1, Article 33. DOI: 10.2202/1544-6115.1678

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Deviance Information Criteria for Model Selection in Approximate Bayesian Computation

Olivier Francois and Guillaume Laval

Abstract

Approximate Bayesian computation (ABC) is a class of algorithmic methods in using statistical summaries and computer simulations. ABC has become popular in evolutionary genetics and in other branches of biology. However, model selection under ABC algorithms has been a subject of intense debate during the recent years. Here, we propose novel approaches to model selection based on posterior predictive distributions and approximations of the deviance. We argue that this framework can settle some contradictions between the computation of model probabilities and posterior predictive checks using ABC posterior distributions. A simulation study and an analysis of a resequencing data set of human DNA show that the deviance criteria lead to sensible results in a number of model choice problems of interest to population geneticists.

KEYWORDS: approximate Bayesian computation, model choice, expected deviance, information criterion, population genetic models

Author Notes: Olivier Francois, University Joseph Fourier, Grenoble. Guillaume Laval, Institut Pasteur, Human Evolutionary Genetics, Department of Genomes and Genetics; Centre National de la Recherche Scientifique.

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Francois and Laval: Deviance Information Criteria for ABC

Introduction

Approximate Bayesian Computation (ABC) is a class of Monte-Carlo algo- rithms for parameter inference based on summary instead of the full data (Beaumont et al 2002, Marjoram et al 2006, Beaumont 2010). More specifically, ABC algorithms use simulations from a stochastic model to gen- erate random samples from an approximation of the posteriordistribution of a multidimensional parameter, θ, after reduction of the original data, y0, into a set of summary statistics, s0 = s(y0). Here w e will consider that s0 are the only data a v a i l a b l e to ABC analyses, and will refer to (Robert et al 2011) for discussions related to the sufficiency of the summary statistics. ABC methods found their origin in evolutionary genetics (Pritchard et al 1999, T a v a r ´e 2004), where they have beenfruitfully applied to the inference of de- mographic history of several species (Lopes and Beaumont 2009, Csill´ery et al 2010a, Beaumont 2010). Examples of analyses encompass the evaluation of alternative scenarios of h u m a n evolution (Fagundes et al 2007, P a t i n et al 2009, Laval et al 2010), inference in demographic models of populationex- pansion, bottleneckor migration (Thornton et al 2006, F r a n c¸ o i s et al 2008), populationstructure and adaptation (Bazin et al 2010). Current ABC algorithms fall into broad subclasses of methods that ex- tend the standard subclasses of computational algorithms used in Bayesian statistics. The first class of algorithms makes use of the rejection algorithm to accept parameters generating simulated data close to the observations (Pritchard et al 1999). The rejection algorithm performsthe following steps: 1) Generate a candidate v a l u e θfrom a prior distribution; 2) Simulate a data set, y, from a generating mechanism using the parameter θ, and compute the set of summary statistics s = s(y); 3) Accept the v a l u e of θif the (Eu- clidean) distance betweens and s0 is less than , a prespecified error v a l u e ; 4) If rejected, go to 1). F o r this basic algorithm, the accepted v a l u e s (θi) form a random sample from an approximation of the posteriordistribution. The above approximation becomesexact as goes to zero, but the algo- rithm is then highly inefficient. Recent techniques improve the approxima- tion of the posteriordistribution b y applying linear or non-linear transforms (Beaumont et al 2002, W e g m a n n et al 2009, Leuenberger and W e g m a n n 2010, Blum and F r a n ¸c o i s 2010). In those improvements, the accepted v a l u e s of the parameter, θi, are w e i g h t e d b y a quantity that depends on the distance be- t w e e n si and s0. Then they are adjusted according to a regression transform, ∗ T for example, θi =θi − b (si − s0), where b is a v e c t o r of

1

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Statistical Applications in Genetics and Molecular Biology, Vol. 10 [2011], Iss. 1, Art. 33 coefficients (Beaumont et al 2002). Several studies have provided evidence that the transformed parameters form a significantly betterapproximation of the posteriordistribution than the non-transformed ones (Beaumont et al 2002, Blum and F r a n ¸c o i s 2010), and regression adjustments are now widely used b y ABC practioners (Thornton 2009, Cornuet et al 2009, Lopes and Beaumont 2009). Two other classes of algorithms implement Markov c h a i n Monte Carlo methods without likelihood (Marjoram et al 2003, Bortot et al 2007) and iterative algorithms that w e r e originally inspired b y sequential Monte Carlo samplers (Sisson et al 2007, Beaumont et al 2009, T o n i et al 2009). An important aspect of ABC is its use for model selection in addition to parameter estimation. In general the aim of model selection is to find models receiving the highest posteriorprobabilities among a finite subset of candidates. Bayesian statisticians have devised n u m e r o u s w a y s to evaluate and select models for inference (Gelman et al 2004). Assuming that there are M models under consideration, the Bayesian paradigm includes model selec- tion in the inference step, taking the model label as an additional parameter, m. In decision theoretic approaches, model c h o i c e is performedon the basis of posteriorprobabilities, p(m|s0), which are proportional to the marginal probabilities, p(s0|m). In ABC, these probabilities can becrudely estimated b y counting simulations from model m that fall at a distance less than a fixed v a l u e to the observed data. More sophisticated estimators of posteriormodel probabilities can befound in (Beaumont 2008) or in (Leuenberger and W e g - mann 2009). Alternatively sequential Monte-Carlo algorithms can also used to estimate model probabilities via iterated importance procedures (Toni and Stumpf 2010). Model selection using ABC algorithms has beenrecently questioned (Tem- pleton 2009, Beaumont et al 2010, Csill´ery et al 2010a, Didelot et al 2011, Robert et al 2011). Here w e pointout a potentiallyserious concern when selecting models on the basis of approximate posteriormodel probabilities. Because approximate model probability estimates are based on the rejection algorithm and ignore regression adjustments on parameter samples, w e ar- gue that model c h o i c e based on these probabilities does not apply to the (approximate) models in which w e eventually make inference. T o see this, assume θ|m =θm, and let

p(θm|s0) ∝ Pr(ks − s0k≤|θ,m)p(θm) , (1) bethe approximation of the posteriordistribution obtained from the rejection

DOI: 10.2202/1544-6115.1678 2

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Francois and Laval: Deviance Information Criteria for ABC

algorithm, where p(θm) denotes the prior distribution on the parameter θm for model m. The joint distribution defining model m is then equal to

p(θm, s0|m) = p(θm|s0)p(s0|m) . (2)

Regression adjustments replace p(θm|s0) with another distribution preg(θm|s0) which is generally closer to the exact posterior distribution. Clearly this c h a n g e modifies the joint distribution in equation (2). Thus a model c h o - sen on the basis of p(m|s0) can bedifferent from the model in which w e eventually estimate parameter uncertainty. In the next section, w e define t w o information theoretic criteria for model selection based on measures of model fit penalizedb y an estimate of the model complexity. While our focus in on regression methods, the ideas introduced in the present study apply to any ABC algorithm. The approach shares sim- ilarities with the popularAkaike information criterion (AIC, Akaike 1974) which is v a l i d for the comparison of nested models (Burnham and Anderson 2002, Johnson and Omland 2004, Ripley 2004, Carsten et al 2009). The as- sumption of nested models is seldom appropriate to ABC, and w e develop a of approximate deviance information criteria (DIC), which are generalizations of AIC that do not require the assumption of nested mod- els (Spiegelhalter et al 2002, Gelman et al 2004). Then w e provide an example of ABC analysis where model c h o i c e based on approximate probabilities dis- agree with the prediction of adjusted models and DICs. Using simulations, w e study the relevance of the proposed information criteria to inference in populationgenetics under v a r i o u s models of demographic history and pop- ulation structure. In the last part of the study, w e present an application to an empirical genetic data set of 20 noncoding DNA regions resequenced from 213 h u m a n s from distinct continents (Laval et al 2010), and w e use the approximate deviance and DIC to question the replacement of Neanderthals b y modern h u m a n s .

Theory

In this section w e describe model selection criteria based on posteriorpre- dictive distributions and approximations of the deviance.

Information theoretic criteria In Bayesian analyses, the deviance infor- mation criterion summarizes the fit of a model b y the posteriorexpectation

3

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Statistical Applications in Genetics and Molecular Biology, Vol. 10 [2011], Iss. 1, Art. 33 of the deviance, D, and the c o m p l e x i t y of a model b y its effective number of p a r a m e t e r s , pD¯(Spiegelhalter et al 2002). The models that receive the highest support from the data are those with the lowest v a l u e s of the DIC. More specifically, the definition of DIC is ¯ DIC = D + pD, (3) where the deviance is minus t w i c e the logarithm of the likelihood, D(θ) = ¯ −2 log p(s0|θ), D is the expected deviance ¯ D = Eθ|s0 [D(θ)] , (4) ¯ and pD is the difference betweenD and the deviance evaluated at a particular point estimate, D(θˆ ). An example of θˆ often used in applications is the estimate of the posteriormean of the model parameter. A complication arises when models are defined hierarchically. In hierar- c h i c a l models there is a hidden parameter ϕ, and the posteriordistribution decomposes as follows

p(θ,ϕ|s0) ∝ p(s0|ϕ)p(ϕ|θ)p(θ) . (5)

In this situation, several definitions of the deviance and DIC have beenpro- poseddepending on the focus of the model (Spiegelhalter et al 2002; Celeux et al 2006). F o r example focusing on θ, the deviance can betaken equal to Z  D(θ) = −2 log p(s0|ϕ)p(ϕ|θ)dϕ , (6) ϕ and the computation of DIC should bemodified accordingly.

A hierarchical model approach to ABC Without regression adjust- ments, one w a y to define ABC is as a hierarchical Bayesian model in which the simulated summary statistics are viewed as latent v a r i a b l e s . In this hi- erarchical model, the posteriordistribution decomposes as

p(θ,s|s0) ∝ p(s0|s)p(s|θ)p(θ) (7) where s are the simulated statistics, and θbecomesthe “hyper-prior” pa- rameter.

DOI: 10.2202/1544-6115.1678 4

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Francois and Laval: Deviance Information Criteria for ABC

In this model, p(s|θ) is the probability of generating the summary statis- tics, s, with parameter θ. T o make use of a hierarchical framework, w e define a surrogate

1 s − s p(s |s)≡K (s − s) = K0 , (8) 0  0   where K is a density function, called the kernel, and is the error parameter. The distribution p(s0|s) can beviewed as a model for the observation error, and ABC performs exact inference under the assumption of model error (Wilkinson 2008). The hierarchical model reformulation of ABC dates to the w o r k of Marjoram et al (2003) who used it as a rationale for defining MCMC algorithms without likelihood. This pointof view has also proven useful in a v a r i e t y of theoretical w o r k s on ABC (Bortot et al 2007, Ratmann et al 2009, Wilkinson 2008). Regarding the basic rejection algorithm, the definition amounts to c h o o s i n g a uniform density function o v e r the interval (0, 1) for the k e r n e l . In this case, w e obtain Z p(s0|θ) ≈ p(s0|s)p(s|θ)ds = Pr(ks0 − sk <|θ) . (9) s Extensions of the rejection algorithm use non-uniform k e r n e l s . F o r exam- ple, Beaumont et al (2002) implemented the Epanechnikov function which is popularin . Because w e w a n t to relate the quantity log K(s0 − s) to a natural measure of model fit, w e take the Gaussian k e r n e l

1 −u2/2 K(u) = √ e , u ∈ R. (10) 2π

With this c h o i c e , the quantity −2 log K1(s0 − s) has a natural interpreta- tion as the sum of squares error betweenobserved and simulated summary statistics. The surrogate model presented above has a t w o - l e v e l hierarchy. F o l l o w i n g Spiegelhalter et al (2002) or Celeux et al (2006), distinct definitions of DIC can beproposed, depending on whether the focus is on the fit of the sum- mary statistics to the observed ones or on the model parameters themselves. F o c u s i n g on the parameter level allows us to betterevaluate the predictive power of the fitted models, and w e next introduce t w o definitions for an approximate deviance at this level.

5

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Statistical Applications in Genetics and Molecular Biology, Vol. 10 [2011], Iss. 1, Art. 33

A first w a y to define a Bayesian deviance is b y considering a posterior predictive a v e r a g e of a “low level” deviance Z dev(θ) = −2 Es|θ[log p(s0|s)] = −2 log p(s0|s)p(s|θ)ds . (11) s In this case, the expected Bayesian deviance is ¯ D1 = Eθ|s0 [dev(θ)] = −2 Es|s0[log p(s0|s)]

With this definition, a Monte-Carlo estimate of the expected deviance can beeasily computed from the simulated data as follows

n 2 X D¯ ≈ − log K (sj − s ) , (12) 1 n  0 j=1 where the sj are summary statistics obtained from the posteriorpredictive distribution p(s|s0). T o compute the penaltypD,1, w e generate n summary statistics sj from p(s|θˆ ), where θˆ is a pointestimate of θ, for example an estimate of the posteriormean, E[θ|s0]. Applying the same formula as above, ¯ ˆ w e come with an estimate D1(θ) that w e use to define pD,1 ¯ ¯ ˆ pD,1 = D1 −D1(θ) . (13)

Though the focus is on the parameter θ, the previous definition of a deviance is not equivalent to equation (6). A definition of the deviance in a hierarchical model consistent with this equation is as follows Z  D(θ) = −2 log p(s0|θ) = −2 log p(s0|s)p(s|θ)ds , (14) s which is also equal to  D(θ) = −2 log Es|θ[K(s − s0)] . (15)

With this definition, an estimate of the expected deviance requires t w o levels of Monte Carlo integration

m n ! 2 X 1 X D¯ = E [D(θ)] ≈ − log K (sj − s ) (16) 2 θ|s0 m n  i 0 i=1 j=1

DOI: 10.2202/1544-6115.1678 6

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Francois and Laval: Deviance Information Criteria for ABC

where w e have m replicates, (θi)i=1,...,m, from the approximate posteriordis- j tribution, p(θ|s0), and each si is sampled from p(s|θi), j = 1, . . . , n. T o j compute pD,2, w e generate n summary statistics, s , from the conditional distribution p(s|θˆ ), where θˆ is a pointestimate of θ, and w e set

n ! 1 X D¯ (θˆ ) ≈ −2 log K (sj − s ) . (17) 2 n  0 j=1 Then w e define ¯ ¯ ˆ pD,2 = D2 −D2(θ) . (18) ¯ Both definitions of D and pD lead to distinct definitions of an information ¯ criterion, DICi = Di +pD,i, i = 1, 2. DIC2 has the advantage of defining DIC for ABC models more rigorously than DIC1, but it has the disadvantage of beingcomputationally more intensive. ¯ Unlike model probabilities, D and pD can becomputed from any approx- imation of the posteriordistribution. Using linear or non-linear regression adjusments, w e can consider the transformed parameter posteriordistribu- tion, preg(θm|s0), instead of p(θm|s0). T o compute DIC, w e then replace the ∗ θi’s b y their adjusted v a l u e s θi’s, sampled from the modified posteriordistri- bution, and generate posteriorpredictive densities from these v a l u e s . In the sequel, DICi will refer to predictive distributions generated from adjusted parameters. T o motivate the use of information criteria and illustrate some of the issues presented in the introduction, w e consider an example where samples of size n = 20 are simulated from a Gaussian disribution of µ0 = 2 and standard σ0 = 3 (then assumed to beunknown). The data are summarized b y their empirical mean, , and , and the sample size is known. W e observed s0 = (2.00, 3.11, −0.78, 0.14). F o r these data t w o models are hypothesized. The of first model is a Gaussian distribution where the parameter, θ= (µ, σ2), corresponds to the mean and the v a r i a n c e . The prior distribution on µ is a Gaussian distribution of mean 2 and standard deviation 10. The prior distribution on σ is an inverse- exponential distribution of rate 1. The sampling distribution of the second model is a Laplace distribution of mean 3 and rate λ. The prior distribution on λis an exponential distribution of rate 1. T o performABC analyses, w e simulated 10,000 samples from each model, and pooledthe 20,000 v e c t o r s of summary statistics into a single data set.

7

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Statistical Applications in Genetics and Molecular Biology, Vol. 10 [2011], Iss. 1, Art. 33

Using an acceptance rate of 10%, w e estimated model probabilities using the R package abc (Csill´ery et al 2011, R core team 2010). This package computes the proportion of accepted simulations under bothmodels, and also implements the w e i g h t e d method of (Beaumont 2008). The , defined as the ratio of marginal probabilities for t w o mod- els m1 and m2 can beestimated as the ratio of counts in favor of m1 and m2 (Pritchard et al 1999, Grelaud et al 2009). The proportion of accepted simulations from model 2 (Laplace) w a s 0.83. Assuming a uniform prior distribution on models 1 and 2 w e obtained an approximation of the Bayes factor equal to BF ≈ 5.02. The logistic regression estimate for the of model 2 w a s 0.85, and w e obtained BF ≈ 5.75. According to Jeffrey’s scale on Bayes factors (Jeffreys 1961), there w o u l d besubstantial evidence in favor of the Laplace model o v e r the Gaussian model. In a second stage, w e performedregression adjustments to the approx- imate posteriorsamples. The observed v a l u e of the kurtosis w a s outside the tails of the posteriorpredictive distribution under the Laplace model. In contrast it w a s within the tails of the posteriorpredictive distri- bution under the Gaussian model. Thus there is an apparent contradiction betweenthe computation of model probabilities and the predictions from the posteriordistribution. T o betterunderstand this contradiction, w e drew the exact posteriordistribution of σ2 under the Gaussian model (an inverse- 2 2 Gamma(11, 1 + 9.5v0) distribution, where v0 is the empirical v a r i a n c e ) . Al- though posterior density approximations are improved b y the adjustment method (Figure 1), the estimates of model probabilities did not account for such improvements. Then w e computed DICs for the Gaussian and Laplace models. Under the Gaussian model, w e obtained DIC1 = 4.5 and DIC2 = 3.2. Under the Laplace model, w e obtained DIC1 = 10.1 and DIC2 = 4.5. Once the corrections w e r e applied, DIC indicated that the Gaussian model w a s a betterc h o i c e than the Laplace model. T o investigate whether our analysis w a s robust, w e replicated it 100 times with v a l u e s of s0 sampled from the same Gaussian distribution in each repli- cate. The a v e r a g e v a l u e of the Bayes factor w a s around 3.45 (3.58 when using the logistic regression method) in favor of the Laplace model which obtained the highest posteriorprobability in 100% of the replicates. In contrast, the Gaussian model w a s preferred in 100% of the replicates when DIC w a s used. The mean v a l u e of the information criterion w a s around 4.1 (2.4 for DIC2) under the Gaussian model, whereas is w a s around 12.4 (3.7 for DIC2) under the Laplace model.

DOI: 10.2202/1544-6115.1678 8

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Francois and Laval: Deviance Information Criteria for ABC

Exact posterior distribution Approximate posterior distribution Approximate posterior distribution (rejection) (regression) 0.12 0.20 0.12 0.15 0.08 0.08 0.10 Density 0.04 0.04 0.05 0.00 0.00 0.00 0 10 20 30 40 50 0 10 20 30 40 5 10 15 20 25 30 35 σ2

Figure 1: Posterior distributions of p a r a m e t e r σ2 (Gaussian model). A) The exact p o s t e r i o r distribution. B) The approximation obtained from the r e j e c t i o n algorithm. C) The approximation obtained with the r e g r e s s i o n ad- justement.

P o p u l a t i o n genetic models

F o r each simulated data set and each model considered afterwards, w e per- formed 10,000 simulations under specified prior distributions (100,000 for the h u m a n data). W e ran ABC analyses using an acceptance rate of 10% (1% for the h u m a n data). T o compute DIC1 w e additionally created n =1,000 repli- cates from the posteriorpredictive distribution. F o r DIC2, w e used m = 200 and n = 200 replicates.

Demographic models. W e generated coalescent simulations under selec- tively neutral models of micro-evolution for 3 distinct demographic scenarios: a sudden decline in populationsize (bottleneck without recovery), a constant populationsize, an exponentially growing populationsize. In these data, fifty diploid individuals w e r e genotyped at 20 non recombining loci. The data w e r e simulated as DNA sequences under an infinitely many-sites model using the computer program ms (Hudson 2002). F o r each of the three models, w e simulated one h u n d r e d replicates of the data with fixed parameter v a l u e s . In ms, the parameters are expressed in units of the current populationsize, N0. In our simulations, the normalized m u t a t i o n rate w a s equal to θ= 2N0µ = 3. In the bottleneckmodel, the population size shrunk to a fraction x = 1/4 of its ancestral v a l u e , and this event occurredt = 0.2N0 generations in the past. In the expanding populationmodel, the expansion rate w a s set to α= 2. The prior distribution on the m u t a t i o n rate, θ, w a s uniformly distributed o v e r (0, 15) for all models. In the bottleneckmodel, the date of the bottleneck

9

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Statistical Applications in Genetics and Molecular Biology, Vol. 10 [2011], Iss. 1, Art. 33

Bottleneck Constant size Expansion

Bottleneck? 13.66 (7.72) 12.83 (7.99) 18.36 (9.82) 10.08 (8.80) 10.59 (6.08) 17.04 (9.71)

Constant size 40.06 (10.93) 2.83 (0.18) 3.15 (0.30) 30.24 (14.71) 2.71 (0.17) 2.95 (0.39)

Expansion 17.48 (4.59) 3.65 (0.64) 3.36 (0.23) 11.40 (4.12) 3.58 (0.69) 2.87 (0.17)

T a b l e 1: Comparisons of demographic models using DIC1 and DIC2. The r o w s c o r r e s p o n d to the models used for simulating the data, and the c o l u m n s c o r r e s p o n d to the models under which inference was p e r f o r m e d . The values r e p r e s e n t the mean and standard deviation of DICs c o m p u t e d over 100 inde- p e n d e n t r e p l i c a t e s for e a c h model. ?: V a l u e s under the b o t t l e n e c k model were c o m p u t e d with the of the p o s t e r i o r deviance instead of their mean, b e c a u s e the median is less sensitive to large deviations.

event (in unit of N0) w a s uniformly distributed o v e r (0, 1), and log10 x w a s uniformly distributed o v e r (0, 1.5). In the expansion model log10 αw a s also uniformly distributed o v e r (0, 1.5). ABC analyses w e r e performedusing the following summary statistics: the T a j i m a ’ s estimator π, computed as the mean n u m b e r of differences betweenpairs of sequences, T a j i m a ’ s D (Tajima 1989), and F a y and W u ’ s H statistic (Fay and W u 2000). The three summary statistics w e r e a v e r a g e d o v e r the 20 loci. W e applied ABC analyses and deviance information criteria to population genetic data simulated under a bottleneck,a constant populationsize and an expansion model (100 data sets for each demographic model). T a b l e 1 reports congruent results for DIC1 and DIC2, and Figure 2 reports the outcome of model selection for the three models. Although DIC1 is generally greater than DIC2, the t w o measures pro- duced highly correlated results (Figure 3). In these examples, bothcriteria agreed in their evaluation of models. When the data w e r e simulated un-

DOI: 10.2202/1544-6115.1678 10

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Francois and Laval: Deviance Information Criteria for ABC

100% 90% 80% 70% 60% 50% Expansion 40% Constant size 30% Bottleneck 20% 10% 0% Bottleneck Constant Expansion size

Figure 2: Demographic scenarios. Model choice using the deviance informa- tion criterion, DIC2, when a specific scenario is assumed. See the text for the definition of models. The frequencies were c o m p u t e d over 100 independent r e p l i c a t e s from e a c h model. The same r e s u l t s were obtained with DIC1. der the bottleneckmodel, reported estimates w e r e obtained with the median of the posteriordeviance instead of their mean (The mean provided highly v a r i a b l e results). The preferred model w a s the bottleneckmodel in 61/100 replicates. When the data w e r e simulated under the constant population size model, the preferred model w a s the constant size model in 81/100 repli- cates. When the data w e r e simulated under the expanding populationsize model, the preferred model w a s the expanding populationmodel in 89/100 replicates. Overall, models that did not generate the data w e r e selected in a small but non-neglectible n u m b e r of replicates (23%). F o r data simulated under the bottleneckmodel, the lower performancescan beexplained b y the relatively recent date of bottleneckevent and b y the c h o i c e of diffuse prior distributions. Both factors contribute to the difficulty of distinguishing be- t w e e n a bottleneckand a constant size model. In addition there is great variability in the simulated summary statistics, and this can explain why the bottleneckand constant populationsize models w e r e difficult to tease apart. T o gain insight on model c h o i c e , w e examined the results obtained for a particular replicate in further details. The replicate w a s obtained from the

11

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Statistical Applications in Genetics and Molecular Biology, Vol. 10 [2011], Iss. 1, Art. 33

A B

● 5.0 ● 5.5 ● 5.0

● 4.5 ●● 4.5 ● ● ●●● 4.0 DIC1 ●● 4.0 ● ● ● ● ● ●● ●● 3.5 ● ●●● ● ● 3.5 ● ●● ●●●● ●●●● ● ●●● ● ●●●●●● ● ●● ● ●●● ● ●● ● 3.0 ● ● 3.0 ● ● ●● ● ● ● ●●● ● ●

3.0 3.5 4.0 4.5 5.0 5.5 2.5 3.0 3.5 4.0 4.5 DIC2 DIC2

Figure 3: Correlation b e t w e e n DIC1 and DIC2. A) Inference under a p o p u - lation expansion model for data simulated under a c o n s t a n t p o p u l a t i o n size model. B) Inference under a c o n s t a n t p o p u l a t i o n size model for data simu- lated under an expanding p o p u l a t i o n model. bottleneckmodel, where the effective m u t a t i o n rate, the time of the bottle- neck event and the severity of the bottleneck1/x w e r e set to the v a l u e s (3, .2, 4). The observed summary statistics w e r e equal to π = 8.58, D = 1.32 and H = 9.42. Under the bottleneckmodel, the pointestimates of the ef- fective m u t a t i o n rate (posterior mean = 2.90, .95 CI = [1.34, 4.36]), the time of the bottleneckevent (posterior mean = 0.35, .95 CI = [0.04, 0.62]) and the severity of the bottleneck(posterior mean = 5.01, .95 CI = [2.18, 47.81]) w e r e close to the v a l u e s used in the simulation. DIC2 w a s equal to 14.13 (median estimate), whereas it w a s equal to 17.83 under the constant populationsize model. In this case the DIC slightly favored the model that generated the data. Under the bottleneckmodel, the observed v a l u e s of the summary statistics w e r e indeed within the tails of the posteriorpredictive distributions whereas they w e r e outside the tails in the constant size popu- lation model (Figure 4). W e also investigated why in some case DIC failed to select the model from which the data w e r e simulated. F o r one bottleneck data set, the observed summary statistics w e r e equal to π = 6.77, D = 0.56 and log H = 1.71. DIC2 w a s equal to 10.11 (median estimate) under the bot- tleneck model, whereas it w a s equal to 6.08 under the constant population size model. Under bothmodels, the observed v a l u e s of the summary statis- tics w e r e within the tails of the posteriorpredictive distributions, but the posteriordistributions had larger tails under the bottleneckmodel than un-

DOI: 10.2202/1544-6115.1678 12

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Francois and Laval: Deviance Information Criteria for ABC der the constant size model, and DIC favored the most parcimonious model (constant populationsize). This shows that in some cases the data do not contain enough information to discriminate betweenmodels, and highlights the difficulty of making model inference under coalescent models.

Isolation with migration. Next w e generated coalescent simulations un- der 3 distinct models of populationstructure: of t w o subpop- ulations, divergence with unidirectional gene flow from subpopulation 1 to subpopulation 2, and divergence with migration betweent w o subpopulations. Fifty diploid individuals in each populationw e r e genotyped at 100 non re- combining loci. F o r each of the three models, w e simulated one h u n d r e d replicates of the data with fixed parameter v a l u e s . F o r these models, the parameters are expressed in units of the current subpopulation size, N0. In our simulations, the normalized m u t a t i o n rate w a s equal to θ= 2N0µ = 4. In each model, the t w o subpopulations had equal populationsize (both w e r e equal to N0). F o r the divergence model, the split occurred0.7N0 generations ago. In the divergence with migration model, the normalized migration rates w e r e equal to m1N0 = 40 and m2N0 = 30, whereas in the unidirectional gene flow model, w e took m1N0 = 40 and m2N0 = 0. When running ABC inferences, subpopulations sizes w e r e parameterized as ν1N0 and ν2N0 respectively, and w e used uniform prior distributions on (0, 3) bothfor ν1 and ν2. In all models, the divergence time w a s uniformly distributed o v e r (0, N0). Normalized migration rates w e r e uniformly dis- tributed o v e r (0, 100). In addition to the three summary statistics π, D and H used in demographic models, w e also computed an FST statistic for each data set according to Hudson’s estimator (Hudson et al 1992). The summary statistics w e r e a v e r a g e d o v e r the 100 loci. W e applied ABC analyses to 300 data sets simulated under the above models of populationstructure and gene flow betweent w o subpopulations. T a b l e 2 reports congruent results for DIC1 and DIC2, and Figure 5 describes the results of model selection for the three models of populationstructure. As for the simulations of demographic scenarios, DIC1 and DIC2 produced highly correlated results, and agreed in their evaluation of models. When the data w e r e simulated under the divergence model, the preferred model w a s the divergence model in 98/100 replicates. When the data w e r e simu- lated under a model with unidirectional migration, the preferred model w a s the model with asymetric migration in 92/100 replicates. When the data

13

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM summary stant H lated arithm Figure not (Nielsen e r e w terior models, under ters. favored it model e r e w DOI: 10.2202/1544-6115.1678 never = Statistical ApplicationsinGeneticsandMolecularBiology,Vol.10[2011],Iss.1,Art.33 select not simulated under 9 n o i t a l u p o p A predictive the (49/100 . 98 4: of e s o h c explanatory the and l a c i p y t informative . statistics. the bi-directional y a F Posterior Inferences a observed y e l e k a W Brought to youby|North CarolinaStateUniversity (NCSU) Libraries the model k c e n e l t t o b replicates) and Frequency Frequency under example distributions.

size 0 50 100 150 200 0 100 200 300 400 500 600 model 0 4 Wu’s that predictive 5 models enough Diversity Diversity 50 summary model. 2001, the 6 were 7 100 Posterior predictivedistributions(Constantsizemodel) Download Date |5/17/15 12:10 AM migration model model, Posterior predictivedistributions(Bottleneckmodel) 8 generated or H of migration 9 150 Hey . d e m r o f r e p a to that posterior The model The The that distributions distinguish statistics Authenticated Frequency Frequency and and

0 50 100 150 0 20 40 60 80 100 120 required model vertical observed summary generated the −0.5 model, 0 were with Machado Tajima's D Tajima's D 1 predictive under 0.0 data 2 e r e w 0.5 is asymetric l a u q e between 3 the s r a b 1.0 displayed DIC of summary should a statistics between the 2003). smallest π k c e n e l t t o b Frequency

to Frequency d n o p s e r r o c favored , s k c e h c 0 50 100 150 0 50 100 150 200 250 data. s ’ a m i j a T π −2 0.5 gene log(Fay andWu's H) migration not log(Fay andWu's H) = 1.0 0 The in statistics the used 1.5 9 2 As r e b m u n be flow Figure . for either 77 2.0 model 4 fact tails previously, considered , to D in data and D (51/100), that this the a and of 6. = were and of divergence divergence simulated their r o F 1 observed example DIC parame the . 40 a simu- as DICs the - n o c pos- and log- but did an 14 3 - Francois and Laval: Deviance Information Criteria for ABC

Isolation Asymetric Migration Migration

Isolation 2.97 (.15) 3.37 (.25) 4.67 (.71) 2.67 (.12) 3.18 (.39) 4.15 (.81)

Asymetric 3.13 (.25) 2.79 (.19) 4.10 (.31) Migration 2.84 (.23) 2.65 (.20) 3.41 (.33)

Migration 3.02 (.12) 3.02 (.20) 3.82 (.25) 2.73 (.11) 2.85 (.19) 3.25 (.20)

T a b l e 2: Comparaison of isolation with migration models. The r o w s c o r r e - spond to the models used for simulating the data, and the c o l u m n s c o r r e s p o n d to the models under which inference was p e r f o r m e d . The values r e p r e s e n t the mean and standard deviation of DIC1 and DIC2 c o m p u t e d over 100 indepen- dent r e p l i c a t e s for e a c h model.

error. The correct interpretation is that the three models w e r e equally good at reproducing the observed summary statistics, and DIC v a l u e s confirmed that w e could not make any strong decision in this case.

Human data analysis

In this section, w e used the deviance information criterion to question the replacement of Neanderthals b y modern h u m a n using modern h u m a n DNA. Genetic data based on resequencing from noncoding regions w e r e utilized to discriminate betweenmodels of divergence incorporating v a r i o u s levels of admixture betweenthe t w o h u m a n groups (Laval et al 2010). W e used sequence v a r i a t i o n surveyed in DNA samples from 213 healthy donors. The panel included 118 sub-Saharan African individuals, 47 European individuals, and 48 East-Asian individuals. F o l l o w i n g Laval et al (2010), w e re-analyzed 20 autosomal regions (27 kb perindividual, mean sequence length perregion of 1.33 kb) that met criteria of selective neutrality. The regions w e r e selected to beindependent from each other, residing at least 200 kb apart from any known or predicted gene or spliced expressed sequence tag, and not to in linkage disequilibrium with any known or predicted gene.

15

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Statistical Applications in Genetics and Molecular Biology, Vol. 10 [2011], Iss. 1, Art. 33

100% 90% 80% 70% 60% 50% Unidirectional gene flow 40% Divergence 30% 20% 10% 0% Divergence Unidirectional Bidirectional gene flow gene flow

Figure 5: Population structure and migration scenarios (100 replicates). Model choice using the deviance information criterion, DIC1, when a p a r - ticular scenario is assumed. See the text for the definition of models. The frequencies were c o m p u t e d over 100 independent r e p l i c a t e s from e a c h model.

T o run an ABC analysis, w e used a coalescent-based algorithm imple- mented in SIMCOAL 2 (Laval et al 2004), w e generated synthetic data that consisted of 20 independent DNA sequences of 1.4 kb each. The m u t a t i o n and the recombination rates of each region w e r e drawn from gamma distributions (Table 3). The evolutionary scenarios assumed an early diffusion of archaic hominids out of Africa ∼1.25 and ∼2.25 million y e a r s ago and an African exodus of modern h u m a n s between∼40,000-100,000 y e a r s ago (Fagundes et al 2007). By tuning the replacement rate, δ, w e considered v a r i o u s levels of introgression of archaic genetic material into the modern h u m a n gene pool. Nineteen models differing in their prior distributions on δw e r e considered. In these models, each prior distribution w a s uniform o v e r an interval of length 0.1, from 0 <δ<0.1 to 0.9 <δ<1. Each interval w a s deduced from the previous one b y a translation of h = 0.05. Our objective w a s to discriminate among models with low, medium or high levels of admixture.

DOI: 10.2202/1544-6115.1678 16

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM observed dictive model). els. Figure

Frequency Frequency Frequency

Data 0 100 200 0 100 250 0 100 250 6: distributions Diversity 1 Diversity Diversity 1 2 The summary 2 Posterior 3 3 were 3 4 Brought to youby|North CarolinaStateUniversity (NCSU) Libraries summary 4 5 5 Francois andLaval:DevianceInformationCriteriaforABC 5 simulated statistics. predictive under Bidirectional migrationBidirectional model(DIC=2.62) Frequency Asymetric migration model(DIC=2.26) statistics Download Date |5/17/15 12:10 AM 0 100 250 0 100 250 0 100 250 Divergence model(DIC=2.24) −0.4 −0.3 −0.3 all under Fst 0.0 0.0 Fst checks Fst models. 0.1 Authenticated 0.4 were 0.3 the for within bidirectional The three

0 100 250 0 100 250 0 100 250 vertical −1.5 −1.0 the isolation Tajima D Tajima D Tajima D −1.0 tails 0.0 0.0 migration 0.0 s r a b of with the d n o p s e r r o c migration r o i r e t s o p

model 0 200 500 0 100 200 0 100 250 −1.5 −1.0 0.0 log(H) log(H) log(H) 0.5 0.5 1.0 to (third mod- pre- 2.0 2.0 the 17 Statistical Applications in Genetics and Molecular Biology, Vol. 10 [2011], Iss. 1, Art. 33

P a r a m e t e r mean min max prior

DNA parameters Mutation rate µ 2.5×10−8 1.3×10−8 5.05×10−8 G Recombination rate ρ 10−8 0.1 × 10−8 1.5 × 10−8 G Demographic parameters Any populationsize N. 10000 500 40000 U 6 6 6 Time of exodus from Africa TE 1.88×10 1.25×10 2.5×10 U Time of European-Asian split TE−EA 25010 12520 37500 U Migration rate m 2 × 10−4 10−6 4×10−3 ND T a b l e 3: Description of the prior distributions of simulated parameters. U and G denote Uniformly and Gamma distributed distributions, ND (for not drawn). Times are expressed in number of years (generation times of 25 years). The modern migration r a t e s , m, is the proportion of migrants after the Out-of-Africa exodus. The mutation r a t e , µ, is expressed in p e r genera- tion p e r b a s e , and the recombination r a t e , ρ, is expressed in p e r generation p e r p a i r of adjacent b a s e s .

Under equilibrium assumptions, the h u m a n effective populationsize has beenestimated at ∼10,000 individuals on the basis of human-chimpanzee divergence and intra-specific linkage disequilibrium levels (Harpending et al 1998). T o give populationsize a degree of freedom and to match with a con- sensus estimate of h u m a n populations,w e defined a gamma prior distribution with a mean of 10,000 individuals and a .95 confidence interval of 3,000 to 21,000 individuals (Table 3). In all models w e considered a constant size for the African, Asian and European modern populations(Laval et al 2010). F o r each genomic region, w e specifically computed global and pairwise FST, based on haplotype frequencies, the n u m b e r of haplotypes, K, the n u m - berof polymorphisms,S, the n u c l e o t i d e diversity, π, and T a j i m a s D, the expected heterozygosity. W e also computed the v a r i a n c e betweenregions for π and D. The summary statistics w e r e calculated b y merging all population samples (except for populationdifferentiation indices) in order to minimize the effects of recent demographic events related to the continental popula- tions.

DOI: 10.2202/1544-6115.1678 18

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Francois and Laval: Deviance Information Criteria for ABC 40

● Expected Deviance

30 DIC

● ● 20

● ● ● ● ● ●

● ●

10 ●

● ● ● ● ● ● ● ●

[0,0.1] [0.2,0.3] [0.4,0.5] [0.6,0.7] [0.8,0.9]

Replacement rate intervals

Figure 7: Replacement of Neanderthals by modern humans. Expected De- ¯ viance, D1, and DIC1 for models with different prior distributions on the replacement r a t e , δ. The prior distributions are uniform over intervals of length 0.1. Each interval is deduced by translation of h = 0.05 from the previous one.

Considering 19 models with distinct prior distributions for the replace- ment rate of Neanderthals b y modern h u m a n s , w e found a decreasing trend ¯ bothin the expected deviance D1 and in DIC1. F o r high v a l u e s of the re- ¯ placement rate, δ, D1 and DIC1 w e r e close to each other (Figure 7). These indices favored models exhibiting high v a l u e s of the parameter δand low lev- els of genetic introgression of the h u m a n genetic poolb y Neanderthal genes (Green et al 2010).

Conclusion

Model selection using ABC algorithms is notoriously difficult (Csill´ery et al 2010b), and it has beenthe topic of an intense debate in evolutionary genetics (Templeton 2009, Beaumont et al 2010, Robert et al 2011). As w e have shown in this study, some approximations of model probabilities computed

19

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Statistical Applications in Genetics and Molecular Biology, Vol. 10 [2011], Iss. 1, Art. 33 b y counting replicates from each model falling at a distance smaller than a given v a l u e , , to the observed data can potentiallylead to systematic errors in ABC, especially when is not small. Using small acceptance rates implies, however, that gigantic n u m b e r s of simulations are performed,especially when more than ten summary statistics are used. Our purpose here w a s not to argue that any model c h o i c e previously per- formed in ABC studies on the basis of approximate model probabilities is unreliable. Indeed, inconsistencies in model predictions may still bedetected b y using standard posteriorpredictive model c h e c k i n g procedures. In addi- tion w e do not argue that regression adjustments are problematic, and the results obtained under our simulation models should motivate users to apply these corrections systematically. Regression adjustments can correct the posteriorv a l u e s obtained from the rejection algorithm, but they have no effect on posteriormodel probabil- ities. T o o v e r c o m e this potentialissue, w e argue that model selection should bedone for the statistical distributions that correspond to the transformed model, and w e propose an approach based on the evaluation of posteriorpre- dictive quantities. Our solution is based on the formulation of an approximate deviance, and b y p a s s e s the estimation of posteriormodel probabilities. Our simulation study showed that the concepts of approximate deviance provide reasonable answers to the model c h o i c e issue in the populationgenetic ex- amples tested, which are representative of this field (Beaumont 2010, Nielsen and W a k e l e y 2001). Finally w e emphasize that since the computation of DIC is based on posteriorpredictive distributions, this approach applies to any t y p e of ABC algorithm.

References

Akaike H. (1974). A new look at the identification. IEEE T A u t o m a t Control 19: 716-723.

Bazin E., K.J. Dawson and M.A. Beaumont MA (2010). Likelihood-free inference of populationstructure and local adaptation in a Bayesian hier- archical model. Genetics 185: 587-602.

Beaumont M.A., W. Zhang and D.J. Balding (2002). Approximate Bayesian computation in populationgenetics. Genetics 162: 2025-2035.

DOI: 10.2202/1544-6115.1678 20

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Francois and Laval: Deviance Information Criteria for ABC

Beaumont M.A. (2008). Joint determination of topology, divergence time, and immigration in populationtrees. In Simulation, Genetics, and Human Prehistory (Matsumura, S., F o r s t e r , P. and Renfrew, C., eds) McDonald Institute for Archaeological Research.

Beaumont M.A., J-M. Cornuet, J-M. Marin and C.P. Robert (2009). Adap- tive approximate Bayesian computation. Biometrika 96: 983-990.

Beaumont M.A., R. Nielsen, C.P. Robert, J. Hey, O.E. Gaggiotti et al (2010). In defence of model-based inference in phylogeography. Mol Ecol 19: 436- 446.

Beaumont M.A. (2010). Approximate Bayesian Computation in Evolution and Ecology. A n n u R e v Ecol Evol Syst 41: 379-406.

Bortot P., S.G. Coles and S.A. Sisson (2007). Inference for stereological extremes. J A m Stat Assoc 102: 84-92.

Blum M.G.B. and O. F r a n ¸c o i s (2010). Non-linear regression models for Ap- proximate Bayesian Computation. Stat Comput 20: 63-73.

Burnham K.P. and D.R. Anderson (2002). Model Selection and Multimodel Inference: a Practical Information-Theoretic Approach. 2nd Edition. Springer- V e r l a g , New Y o r k , USA.

Carstens B.C., H.N. Stoute and N.M. Reid (2009). An information-theoretical approach to phylogeography. Mol Ecol 18: 4270-4282.

Celeux G., F. F o r b e s , C.P. Robert and D.M. Titterington (2006). Deviance information criteria for models (with Discussion). Bayesian A n a l y s i s 1: 651-706.

Cornuet J-M., F. Santos, M.A. Beaumont, C.P. Robert, J-M. Marin et al (2008). Inferring populationhistory with DIY ABC: a user-friendly ap- proach to approximate Bayesian computation. 24:2713- 2719.

21

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Statistical Applications in Genetics and Molecular Biology, Vol. 10 [2011], Iss. 1, Art. 33

Csill´ery K., M.G.B Blum, O.E. Gaggiotti and O. F r a n ¸c o i s (2010a). Approxi- mate Bayesian Computation in practice. T r e n d s Ecology Evol 25: 410-418.

Csill´ery K., M.G.B Blum, O.E. Gaggiotti and O. F r a n ¸c o i s (2010b). Invalid arguments against ABC: A reply to AR T e m p l e t o n . T r e n d s Ecol Evol 25: 490-491.

Csill´ery K., O. F r a n ¸c o i s and M.G.B. Blum (2011). abc: an R package for Approximate Bayesian Computation (ABC). arXiv:1106.2793.

Didelot X., R. Everitt, A. Johansen and D. Lawson (2011). Likelihood-free estimation of model evidence. Bayesian A n a l y s i s 6:48-76.

F a g u n d e s N.J., N. Ray, M.A. Beaumont, S. Neuenschwander, F.M. Salzano et al (2007). Statistical evaluation of alternative models of h u m a n evolution. Proc Natl A c a d Sci USA 104: 17614-17619.

F a y J.C. and C.I. W u (2000). Hitchhiking under positiveDarwinian selection. Genetics 155: 1405-1413.

F r a n ¸c o i s O., M.G.B. Blum, M. Jakobsson and N.A. Rosenberg (2008). De- mographic history of European populationsof Arabidopsis thaliana. PLoS Genet 4: e1000075.

Gelman A., J.B. Carlin, H.S. Stern and D.B. Rubin (2004). Bayesian Data A n a l y s i s (2nd edn), Chapman and Hall, New Y o r k .

Green R., J. Krause, A.W. Briggs, T. Maricic, T. Stenzel et al (2010). A draft sequence of the Neandertal genome. Science 328: 710-722.

Grelaud A., C.P. Robert, J-M. Marin, F. Rodolphe and J.F. T a l y (2009). ABC likelihood-free methods for model c h o i c e in Gibbs random fields. Bayesian A n a l y s i s 4: 317-336.

Harpending H.C., M.A. Batzer, M. Gurven, L.B. Jorde, A.R Rogers et al (1998). Genetic traces of ancient demography. Proc Natl A c a d Sci USA 95: 1961-1967.

DOI: 10.2202/1544-6115.1678 22

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Francois and Laval: Deviance Information Criteria for ABC

Hey J. and C.A. Machado (2003). The study of structured populations: a difficult and divided science has new hope. Nat R e v Genet 4: 535-543.

Hudson R.R., M. Slatkin and W.P. Maddison (1992). Estimation of levels of gene flow from DNA sequence data. Genetics 132: 583-589.

Hudson R.R. (2002). Generating samples under a Wright-Fisher neutral model. Bioinformatics 18:337-338.

Johnson J.B. and K.S. Omland (2004). Model selection in ecology and evo- lution. T r e n d s Ecol Evol 19: 101-108.

Jeffreys H. (1961). The Theory of Probability. Oxford University Press, Oxford UK.

Laval G. and L. Excoffier (2004). SIMCOAL 2.0 A program to simulate ge- nomic diversity o v e r large recombining regions in a subdivided population with a complex history. Bioinformatics 20: 2485-2487.

Laval G., E. P a t i n , L.B. Barreiro and L. Quintana-Murci (2010). Formulating a historical and demographic model of recent h u m a n evolution based on resequencing data from noncoding regions. PLoS ONE 5(4): e10284.

Leuenberger C. and D. W e g m a n n (2010). Bayesian computation and model selection without likelihoods. Genetics 184: 243-252.

Lopes J. and M.A. Beaumont (2010). ABC: A useful Bayesian tool for the analysis of populationdata. Infect Genet Evol 10: 825-832.

Marjoram P., J. Molitor, V. Plagnol and S. T a v a r ´e (2003). Markov c h a i n Monte Carlo without likelihoods. Proc Natl A c a d Sci USA 100: 15324- 15328.

Marjoram P. and S. T a v a r ´e (2006). Modern computational approaches for analysing molecular genetic v a r i a t i o n data. Nat R e v Genet 7: 759-770.

Nielsen R. and J. W a k e l e y (2001). Distinguishing migration from isolation: A Markov c h a i n Monte Carlo approach. Genetics 158: 885-896.

23

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Statistical Applications in Genetics and Molecular Biology, Vol. 10 [2011], Iss. 1, Art. 33

P a t i n E., G. Laval, L.B. Barreiro, A. Salas, O. Semino et al (2009) Inferring the demographic history of African farmers and Pygmy huntergatherers using a m u l t i l o c u s resequencing data set. PLoS Genet 5: e1000448.

Pritchard J.K., M.T. Seielstad , A. Perez-Lezaun, M.W. F e l d m a n (1999). P o p u l a t i o n growth of h u m a n Y chromosomes: a study of Y c h r o m o s o m e microsatellites. Mol Biol Evol 16: 1791-1798.

R Development Core T e a m (2010). R: A L a n g u a g e and Environment for Statistical Computing. R F o u n d a t i o n for Statistical Computing, Vienna, Austria.

Ratmann O., C. Andrieu, C. Wiuf and S. Richardson (2009). Model criticism based on likelihood-free inference. Proc Natl A c a d Sci USA 106: 10576- 10581.

Ripley B.D. (2004). Selecting amongst large classes of models. In Meth- odsand Models in Statistics: In Honour of Professor , FRS (Adams, N., Crowder, M, Hand, D.J. and Stephens, D., eds), pp. 155-170 Imperial College Press.

Robert C.P., J-M. Cornuet, J-M. Marin and N.S. Pillai (2011). Lack of confidence in approximate Bayesian computational (ABC) model c h o i c e . arXiv:1102.4432.

Sisson S.A., Y. F a n and M.M. T a n a k a (2007). Sequential Monte Carlo with- out likelihoods. Proc Natl A c a d Sci USA 104: 1760-1765.

Spiegelhalter D.J., N.G. Best, B.P. Carlin, A. v a n der Linde (2002). Bayesian measures of model complexity and fit. J R Stat Soc Ser B 64: 583-639.

T a v a r ´e S. (2004). A n c e s t r a l Inference in Population Genetics. In: cole d’t de Probabilits de Saint-Flour XXXI 2001, Lecture Notes in Mathematics. Edited b y Cantoni O, T a v a r ´e S, Zeitouni O, eds) Springer-Verlag, Berlin.

T a j i m a F. (1989). Statistical method for testing the neutral m u t a t i o n h y - pothesisb y DNA polymorphism.Genetics 123: 585-595.

DOI: 10.2202/1544-6115.1678 24

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM Francois and Laval: Deviance Information Criteria for ABC

T e m p l e t o n A.R. (2009). Statistical h y p o t h e s i s testing in intraspecific phylo- geography: NCPA v e r s u s ABC. Mol Ecol 18: 319-331.

Thornton K.R. and P. Andolfatto (2006). Approximate Bayesian inference reveals evidence for a recent, severe, bottleneckin a Netherlands popula- tion of Drosophila melanogaster. Genetics 172: 1607-1619.

Thornton K.R. (2009). Automating approximate Bayesian computation b y local linear regression. BMC Genet 10:35.

T o n i T., D. W e l c h , N. Strelkowa, A. Ipsen A and M.P. Stumpf (2009). Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6: 187-202.

T o n i T. and M.P. Stumpf (2010). Simulation-based model selection for dynamical systems in systems and population biology. Bioinformatics 26:104-110.

W e g m a n n D., C. Leuenberger and L. Excoffier (2009). Efficient approximate Bayesian computation coupled with Markov Chain Monte Carlo without likelihood. Genetics 182: 1207-1218.

Wilkinson R. (2008). Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. arXiv:0811.3355v1 [stat.CO].

25

Brought to you by | North Carolina State University (NCSU) Libraries Authenticated Download Date | 5/17/15 12:10 AM