A Bayesian Kepler Periodogram Detects a Second in HD 208487

P.C. Gregory

Technical Report #2006-5 August 17, 2006

This material was based upon work supported by the National Science Foundation under Agreement No. DMS-0112069. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Statistical and Applied Mathematical Sciences Institute PO Box 14006 Research Triangle Park, NC 27709-4006 www.samsi.info Mon. Not. R. Astron. Soc. 000, 1–12 (2006) Printed 17 August 2006 (MN LATEX style file v2.2)

A Bayesian Kepler Periodogram Detects a Second Planet in HD 208487

P. C. Gregory1??† 1Physics and Astronomy Department, University of British Columbia, 6224 Agricultural Rd., Vancouver, British Columbia, V6T 1Z1, Canada

Submitted August 16, 2006

ABSTRACT An automatic Bayesian Kepler periodogram has been developed for identifying and characterizing multiple planetary orbits in precision data. The peri- odogram is powered by a parallel tempering MCMC algorithm which is capable of efficiently exploring a multi-planet model parameter space. Improvements in the peri- odogram and further tests using data from HD 208487 have resulted in the detection 81 0.26 of a second planet with a period of 908−94d, an eccentricity of 0.38−0.20, a semi-major 0.09 0.05 axis of 1.80−0.08 AU and an M sin i =0.46−0.13 MJ. The revised parameters of the first planet are period = 129.8 ± 0.4d, eccentricity = 0.21 ± 0.09, semi-major axis 0.001 0.04 =0.492−0.002 AU and M sin i =0.37−0.03 MJ. Particular attention is paid to sev- eral methods for calculating the Bayes factor which is used to compare models with different numbers of . Key words: Extrasolar planets, Bayesian methods, model selection, time series anal- ysis, periodogram, HD 208487.

1 INTRODUCTION nates the need for a separate periodogram search for trial orbital periods which typical assume a sinusoidal model for The discovery of multiple planets orbiting the Pulsar the signal that is only correct for a circular orbit. In addi- PSR B1257+12 (Wolszczan & Frail, 1992), ushered in tion, the Bayesian MCMC algorithm provides full marginal an exciting new era of astronomy. Fifteen later, parameters distributions for all the orbital elements that can over 190 extra-solar planets have been discovered by be determined from radial velocity data. The samples from a variety of techniques, including precision radial ve- the parallel chains can also be used to compute the marginal locity measurements which have detected the major- likelihood for a given model (Gregory 2005a) for use in com- ity of planets to date (Extrasolar Planets Encyclopedia, puting the Bayes factor that is needed to compare models http://vo.obspm.fr/exoplanetes/encyclo/index.php). It is to with different numbers of planets. The parallel tempering be expected that continued monitoring and increased preci- MCMC algorithm employed in this work includes a con- sion will permit the detection of lower amplitude planetary trol system that automates the selection of efficient Gaus- signatures. The increase in parameters needed to model mul- sian parameter proposal distributions. This feature makes tiple planetary systems is motivating efforts to improve the it practical to carry out blind searches for multiple planets statistical tools for analyzing the radial velocity data (e.g. simultaneously. Ford & Gregory 2006, Ford 2005 & 2006, Gregory 2005b, This paper outlines improvements to the parallel tem- Cumming 2004). Much of the recent work has highlighted pering MCMC algorithm that allow for improved mixing a Bayesian MCMC approach as a way to better understand and more efficient convergence. In addition several differ- parameter uncertainies and degeneracies. ent methods for computing the marginal likelihood are com- Gregory (2005a, b & c) presented a Bayesian MCMC pared. We also confirm our earlier discovery (Gregory 2005c) algorithm that makes use of parallel tempering to efficiently of a second planet in HD 208487. explore the full range of model parameter space starting from a random location. It is able to identify any significant periodic signal component in the data that satisfies Kepler’s laws, thus functions as a Kepler periodogram. This elimi- 2 PARALLEL TEMPERING A simple Metropolis-Hastings MCMC algorithm can run ? E-mail: [email protected] into difficulties if the target probability distribution is multi- † http://www.physics.ubc.ca/ gregory/gregory.html modal with widely separated peaks. It can fail to fully ex-

c 2 P. C. Gregory plore all peaks which contain significant probability, espe- pipeline used to reduce their raw spectra to radial velocities cially if some of the peaks are very narrow. This is frequently has resulted in a new data set of improved radial velocities the case with extrasolar planet data which is typically very together with the addition of 4 new measurements (Butler sparsely sampled. The problem is somewhat similar to the et al. 2006). Most of the results pertain to the new data one encountered in finding a global χ2 minimum in a nonlin- set, but some of the results dealing with model selection in ear model fitting problem. One solution to finding a global Section 7 make use of the old data set and this is indicated minimum is to use simulated annealing by introducing a in the text. temperature parameter T which is gradually decreased. In parallel tempering, multiple copies of the MCMC simulation are run in parallel, each at a different tempera- 2.1 Proposal distributions ture. Mathematically, we can describe these tempering dis- In Metropolis-Hastings versions of MCMC, parameter pro- tributions by posals are drawn from a proposal distribution. In this anal- ysis, independent Gaussian proposal distributions, one for π(X|D, β,M ,I)=Cp(X|M ,I)p(D|M ,X,I)β 1 1 1 each parameter, were employed. Of course, for sparse data = Cp(X|M1,I) × sets it can often be the case that some of the parameters exp(β ln[p(D|M1,X,I)]), (1) are highly correlated resulting in inefficient sampling from for 0 <β<1 independent Gaussians. One solution is to use combinations of model parameters that are more independent. More on where X stands for the set of model parameters. The nor- this later. In general, the σ’s of these Gaussian proposal dis- malization constant, C, is unimportant and will be dropped. tributions are different because the parameters can be very Rather than use a temperature which varies from 1 to in- different entities. Also if the σ’s are chosen to small, suc- finity, we use its reciprocal, β =1/T , and refer to as the cessive samples will be highly correlated and it will require tempering parameter. Thus β varies from 1 to zero. many iterations to obtain an equilibrium set of samples. If One of the simulations, corresponding to β =1, the σ’s are too large, then proposed samples will very rarely is the desired target probability distribution. The other be accepted. Based on empirical studies, Roberts, Gelman simulations correspond to a ladder of higher temperature and Gilks (1997) recommend calibrating the acceptance rate distributions. Let nβ equal the number of parallel MCMC to about 25% for a high-dimensional model and to about simulations. At intervals, a pair of adjacent simulations on 50% for models of 1 or 2 dimensions. Although these stud- this ladder are chosen at random and a proposal made to ies were based on a multinormal target probability distribu- swap their parameter states. A Monte Carlo acceptance tions, they have proven a useful guideline for our application rule determines the probability for the proposed swap to as well. occur. For β = 1, the distribution is the desired target The process of choosing a set of useful proposal σ’s distribution which is referred to as the cold sampler. For when dealing with a large number of different parameters β  1, the distribution is much flatter. This swap allows can be very time consuming. In parallel tempering MCMC, for an exchange of information across the population of the problem is compounded because of the need for a sep- parallel simulations. In the higher temperature simulations, arate set of proposal σ’s for each simulation. We have au- radically different configurations can arise, whereas in tomated this process using a two stage statistical control higher β (lower temperature) states, a configuration is system (CS) in which the error signal is proportional to the given the chance to refine itself. Some experimentation difference between the current acceptance rate and a target is needed to refine suitable choices of β values, which acceptance rate, typically 25%. In the first stage an initial can be assessed by examining the swap acceptance rate set of proposal σ’s (≈ 10% of the prior range for each param- between adjacent simulations. If adjacent simulations do eter) are separately perturbed to determine an approximate not have some overlap the swap rate between them will gradient in the acceptance rate with respect to the proposal be very low. The number of β values required depends σ’s. The σ’s are then jointly modified by a small increment on the application. For parameter estimation purposes in the direction of this gradient. This is done for each of the a typical value of nβ = 12. It is important that the parallel simulations or chains as they are sometimes called. iterations from the lowest value of β explore the full range When the σ’s are large all the MCMC simulations explore of the prior parameter space. For HD 208487, the set β = the full prior distribution and locate significant probability {0.05, 0.1, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.70, 0.80, 0.90, 1.0} peaks in the joint parameter space. As the proposal σ’s de- proved useful and achieved a typical acceptance rate be- crease these peaks are more efficiently explored in the β =1 tween adjacent levels of > 50%. The mean number of simulation. This annealing of the proposal σ’s typically takes iterations between swap proposals was set = 10. Final place over the first 5,000 to 150,000 (unthinned) iterations inference is based on samples drawn from the β =1.0 for one planet and first 5,000 to 300,000 iterations for two simulation. planets. This may seem like an excessive number of itera- It is possible to use the samples from the hotter simula- tions but keep in mind that we are dealing with sparse data tions to evaluate the marginal (global) likelihood needed for sets that can have multiple, widely separated probability model selection (see Section 12.7 of Gregory 2005a). For HD peaks and we want the MCMC to locate the most signifi- 208487, 34 β levels were used spanning the range β =10−8 cant probability peak before finalizing the choice of proposal to 1.0. This is discussed more in Section 7. σ’s. Some of the analysis presented in this paper was based Although the acceptance rate for the final joint set of on the original 31 radial velocity measurements (old data) parameter σ’s is achieved, it often happens that a subset of given in Tinney et al. (2005). Recent improvements in the the proposal σ’s will be too small leading to excessive cor-

c A Bayesian Kepler Periodogram Detects a Second Planet in HD 208487 3 relation in the MCMC iterations for these parameters. The varied ψ =2πχ+ω and φ =2πχ−ω. ψ is well determined for second stage CS corrects for this. In general, the burn-in all eccentricities. Although φ is not well determined for low period occurs within the span of the first stage CS, i.e., the eccentricities, it is at least orthogonal to the ψ parameter. significant peaks in the joint parameter probability distribu- It is easy to demonstrate that uniform sampling of ψ in the tion are found, and the second stage improves the choice of interval 0 to 2π and uniform sampling of φ in the interval proposal σ’s for the highest probability parameter set. Oc- −2π to +2π results in uniform coverage of χ in the interval casionally, a new higher probability parameter set emerges 0 to 1 and uniform coverage of ω from 0 to 2π using the at a later iteration. The second phase of the control system relations can detect this and compute a new set of proposal σ’s. If this ψ + φ χ = Modulus h , 1i happens the control system resets the burn-in period to in- 4π clude all previous iterations. The useful MCMC simulation ψ − φ ω = Modulus h , 2πi (4) data is obtained after the CS is switched off. Although inclu- 2 sion of the control system may result in a somewhat longer effective burn-in period, there is a huge saving in time be- Restricting π to the interval 0 to 2π produces an hourglass cause it eliminates many trial runs to manually establish a coverage of half the χ, ω plane. This re-parameterization, suitable set of proposal σ’s. together with the additional second stage CS, achieved good mixing of the MCMC iterations for a wide range of orbital eccentricities.

3 RE-PARAMETERIZATION Ford (2006) examined the effect of a variety of re- 4 FREQUENCY SEARCH parameterizations for the Kepler model on the MCMC con- For the Kepler model with sparse data the target probability vergence speed. By far the biggest improvement was ob- distribution can be very spiky. This is particularly a prob- tained with re-parameterizations that involved ω, the argu- lem for the parameters which span roughly ment of periastron, and M , the mean anomaly at initial 0 6 decades. In general, the sharpness of the peak depends in . This is because for low eccentricity orbits ω can be part on how many periods fit within the duration of the data. poorly constrained but the observational data can often bet- The previous implementation of the parallel tempering al- ter constrain the ω + M . 0 gorithm employed a proposal distribution for P which was a In our analysis the predicted radial velocity is given Gaussian in the logarithmic of P . This resulted in a constant v(ti)=V + K[cos{θ(ti + χP )+ω} + e cos ω], (2) fractional period resolution instead of a fixed absolute reso- lution, increasing the probability of detecting narrow spec- and involves the 6 unknowns tral peaks at smaller values of P . However, this proved not to V = a constant velocity. be entirely satisfactory because for the HD 73526 data set of K = velocity semi-amplitude. Tinney et al. (2003), one of the three probability peaks (the P = the orbital period. highest) was not detected in two out of five trials (Gregory e = the . 2005b). ω = the longitude of periastron. Our latest algorithm implements the search in fre- χ = the fraction of an orbit, prior to the start of data quency space for the following reason. In a Bayesian analysis, taking, that periastron occurred at. Thus, χP = the number the width of a spectral peak, which reflects the accuracy of of days prior to ti = 0 that the was at periastron, for the frequency estimate, is determined by the duration of the an orbital period of P days. data, the signal-to-noise (S/N) ratio and the number of data θ(ti + χP ) = the angle of the star in its orbit relative points. More precisely (Gregory 2005a), for a sinusoidal sig- to periastron at time ti, also called the true anomaly. nal model, the standard deviation of the spectral peak, δf, for a S/N > 1, is given by We utilize this form of the equation because we obtain S √ −1 the dependence of θ on ti by solving the conservation of δf ≈ 1.6 T N Hz, (5) angular momentum equation N 2 where T = the data duration in s, and N = the number of dθ 2π[1 + e cos θ(ti + χP)] − =0. (3) T dt P (1 − e2)3/2 data points in . The thing to notice is that the width of any peak is independent of the frequency of the peak. Thus Our algorithm is implemented in Mathematica and it proves the same frequency proposal distribution will be efficient for faster for Mathematica to solve this differential equation all frequency peaks. This is not the case for a period search than solve the equations relating the true anomaly to the where the width of a spectral peak is ∝ P 2. With a frequency mean anomaly via the eccentric anomaly. Mathematica gen- search strategy, a re-analysis of the original HD 73526 data erates an accurate interpolating function between t and χ set resulted in all three peaks being detected in five out of so the differential equation does not need to be solved sepa- five trials. rately for each ti. Evaluating the interpolating function for each ti is very fast compared to solving the differential equa- tion, so the algorithm should be able to handle much larger 5 CHOICE OF PRIORS samples of radial velocity data than those currently available without a significant increase in computational time. In a Bayesian analysis we need to specify a suitable prior Instead of varying χ and ω at each MCMC iteration we for each parameter. We first address the question of what

c 4 P. C. Gregory prior to use for frequency for multi-planet models. In this frequency range and the parameters re-labeled afterwards. work, the lower cutoff in period of 1d is chosen to be less In this second approach nothing constrains f1 to always be than the smallest orbital period of known planets but some- below f2 so that degenerate parameter peaks can occur. In what larger than the Roche limit which occurs at 0.2d for general, for a two planet model there are twice as many a10MJup planet around a 1M star. The upper period peaks in the probability distribution possible compared with period cutoff of 1000yr is much longer than any known ex- (a). For a n planet model, the number of possible peaks is trasolar planet, but corresponds roughly to a period where n! more than in (a). Provided the parameters are re-labeled perturbations from passing and the galactic tide would after the MCMC, such that parameters associated with the disrupt the planet’s orbit (Ford and Gregory 2006). For a lower frequency are always identified with planet one and single planet model we use a Jeffreys prior because the prior vice versa, the two cases are equivalent 1 and equation (12) period (frequency) range spans almost 6 decades. A Jeffreys is the appropriate prior for both approaches. prior corresponds to a uniform probability density in ln f. Approach (b) was found to be more successful because This says that the true frequency is just as likely to be in the in repeated blind period searches it always converged on the bottom decade as the top. The Jeffreys prior can be written highest posterior probability distribution peak, in spite of in two equivalent ways. the huge period search range. Approach (a) proved to be d ln f unsuccessful in finding the highest peak in some trials and p(ln f|M1,I) d ln f = (6) in those cases where it did find the peak it required many (ln fH − ln fL) more iterations. Restricting P2 > P1 (f1 6 f2) introduces df p(f|M,I) df = (7) an additional hurdle that appears to slow the MCMC period f(ln fH − ln fL) search. What form of frequency prior should we use for a mul- The full set of priors used in our Bayesian calculations tiple planet model? We first develop the prior to be used in are given in Table 1. Two different limits on Ki were em- a frequency search strategy where we constrain the frequen- ployed. In the first case #1, the upper limit corresponds to the velocity of a planet with a = 0.01 M in a circular cies in an n planet search such that (fL 6 f1 6 f2 ··· 6 orbit with our shortest period of one . Also the upper fn 6 fH ). From the product rule of probability theory and the above frequency constraints we can write bound on Pi of 1000 yr is an upper bound based on galac- tic tidal disruption. Previously we used an upper limit of p(ln f1, ln f2, ···ln fn|Mn,I)=p(ln fn|Mn,I) three times the duration of the data. An upper bound of 1/3 Pmin ×p(ln fn−1| ln fn,Mn,I) ···p(ln f2| ln f3,Mn,I) Kmax was proposed at an workshop at Pi  ×p(ln f1| ln f2,Mn,I). (8) the Statistics and Applied Math Sciences Institute (spring 1/3 2006), however, the factor of Pmin was not incorporated For model selection purpose we need to use a normalized Pi  prior which translates to the requirement that in early runs of the current analysis. We set Kmax = 2129m s−1, which corresponds to a maximum planet-star mass ratio ln f Z H of 0.01. p(ln f1, ln f2, ···ln fn|Mn,I)d ln f1 ···d ln fn =1. (9) For case #2, the upper limit on Ki was set equal to ln f L 1/3 Pmin 1 Kmax P  based on equation (14). We assume that p(ln f , ln f , ···ln f |M ,I) is equal to a i 1−e2 1 2 n n p i constant k everywhere within the prior volume. We can solve 1/3 −2/3 m sin i 2πGM∗ m for k from the integral equation K =   1+  , (14) ln f ln f ln f M∗ P M∗ Z H Z n Z 2 k d ln fn d ln fn−1 ··· d ln f1 =1. (10) where m is the planet mass, M∗ is the star’s mass, and G is ln fL ln fL ln fL the gravitational constant. Case #2 is an improvement over 1/3 Pmin The solution to equation (10) is Kmax because it allows the upper limit on K to Pi  n! depend on the orbital eccentricity. Clearly, the only chance k = . (11) n we have of detecting an orbital period of 1000 yr with current (ln fH − ln fL) data sets is if the eccentricity is close to one and we are lucky The joint frequency prior is then enough to capture periastron passage. Prior #2 was used in n! this work with the exception of Section 7 on model selection. p(ln f , ln f , ···ln f |M ,I)= (12) 1 2 n n n (ln fH − ln fL) In that section, some results were obtained with prior #1 and the rest with prior #2, as indicated in the text. Expressed as a prior on frequency, equation (11) becomes Our analysis also incorporates an additional noise pa- n! rameter, s, that can allow for any additional noise beyond p(f ,f , ···f |M ,I)= (13) 1 2 n n n 2 f1f2 ···fn(ln fH − ln fL) the known measurement uncertainties We assume the noise variance is finite and adopt a Gaussian distribution with We note that a similar result, involving the factor n! in the a variance s2. Thus, the combination of the known errors numerator, was obtained by Bretthorst (2003) in connection with a uniform frequency prior. Two different approaches to searching in the frequency parameters were employed in 1 To date this claim has been tested for n 6 3. this work. In the first approach (a): an upper bound on 2 In the absence of detailed knowledge of the sampling distribu- f1 6 f2 (P2 > P1) was utilized to maintain the identity of tion for the extra noise, other than that it has a finite variance, the the two frequencies. In the second more successful approach maximum entropy principle tells us that a Gaussian distribution (b): both f1 and f2 were allowed to roam over the entire would be the most conservative choice.

c A Bayesian Kepler Periodogram Detects a Second Planet in HD 208487 5

Table 1. Prior parameter probability distributions.

Parameter Prior Lower bound Upper bound

n! P −1 Orbital period Pi 1.0 d 1000 yr log Pmax Pmin  a Velocity Ki (1) Modified Jeffreys 0(K0 =1) Kmax = 2129 (m s−1) −1 1/3 (K+K0) Pmin 1 (2) 0 Kmax  Pi − 2  K P 1/3  1 e log 1+ max min 1 p i K0 Pi  1−e2 p i

−1 V(ms ) −Kmax Kmax

Eccentricity ei Uniform 0 1

Longitude of periastron ωi Uniform −2π 2π

−1 −1 (s+s0) Extra noise s (m s ) 0(s0 =1) Kmax log 1+ smax smin  standard deviation

a Since the prior lower limits for K and s include zero, we used a modified Jeffreys prior of the form 1 1 p(K|M,I)= (15) K K + K0 ln 1+ max K0 

For K  K0, p(K|M,I) behaves like a uniform prior and for K  K0 it behaves like a Jeffreys prior. The Kmax ln 1+ term in the denominator ensures that the prior is normalized in the interval 0 to Kmax. K0  and extra noise has a Gaussian distribution with variance with a period of ≈ 1000 days. Panels (b) and (c) show the 2 2 = σi + s , where σi is the standard deviation of the known best fitting two planet light curve and residuals based on the noise for ith data point. For example, suppose that the star more detailed analysis of the new data which is presented in actually has two planets, and the model assumes only one this paper. is present. In regard to the single planet model, the velocity variations induced by the unknown second planet acts like an Figure 2 shows β = 1 MCMC iterations for each of the additional unknown noise term. Stellar jitter, if present, will parameters starting from a specific but arbitrary initial lo- also contribute to this extra noise term. In general, nature cation in parameter space of P1 = 50 d, e1 =0.3, V =2.0 −1 −1 is more complicated than our model and known noise terms. ms , ψ1 =2.0 radians, K1 = 20 ms , φ1 =0.0 radi- Marginalizing s has the desirable effect of treating anything ans, P2 = 700 d, e2 =0.1, ψ2 =2.0 radians, K2 =15 −1 −1 in the data that can’t be explained by the model and known ms , φ2 =0.0 radians, s =3ms . A total of 1.8 mil- measurement errors as noise, leading to the most conserva- lion iterations were used but only every sixth iteration was tive estimates of orbital parameters (see Sections 9.2.3 and stored. For display purposes only every two hundredth point 9.2.4 of Gregory (2005a) for a tutorial demonstration of this is plotted in the figure. The burn-in period of approximately point). If there is no extra noise then the posterior proba- 40,000 iterations is clearly discernable. The dominant solu- bility distribution for s will peak at s = 0. The upper limit tion corresponds to P1 = 129d and P2 = 982d, but there on s was set equal to Kmax. are occasional jumps to other remote periods demonstrat- ing that the parallel tempering algorithm is exploring the full prior range in search of other peaks in the target pos- terior distribution. The χ and ω traces were derived from 6 RESULTS i i the corresponding ψi,φi traces using equation (4). Figure 3 As mentioned in the introduction, the initial analysis was shows the post burn-in iterations for a window in period carried out using the 31 radial velocity measurements from space (125 6 P1 6 135d and 650 6 P1 6 1200d) that Tinney et al. (2005) who reported the detection of a single isolates the dominant peak. All the traces appear to have planet with M sin i =0.45 ± 0.05 in a 130 ± 1 day orbit with achieved an equilibrium distribution. There is a weak corre- an eccentricity of 0.32±0.1. Figure 1 shows the new precision lation between P2 and e2 and weak correlation tail evident radial velocity data for HD 208487 from Butler et al. (2006) between K2,e2, and V . These correlations are shown better who reported a single planet with M sin i =0.52 ± 0.08 in the joint marginal distributions for 6 pairs of parameters in a 130.08 ± 0.51 day orbit with an eccentricity of 0.24 ± in figure 4. Each dot is the result from one iteration. The 0.16. Gregory (2005c) reported results from a preliminary re- lower four panels of this plot nicely illustrate the advantage analysis of the old data set which indicated a second planet of the using the ψi =2πχi + ωi and φi =2πχi − ωi re- 6 P. C. Gregory

40 L -60 L -55 -57.5 30 Like -70 Like

´ -80 ´ -60

L -62.5

1 20 -90 Prior Prior -

H H -65

s -100

10 10 -67.5 10 -110 m

H -70 Log -120 Log 0 0 50000 100000 150000 200000 250005000075000100000125000150000175000200000 Iteration chain Β=1. Iteration -10 H L 200 1 Velocity -20 150 0.8

HaL L 100

d 0.6 H -30 1 e

1 70 0.4 P 50 -1500 -1000 -500 0 500 1000 0.2 Julian day number -2,452,535.7103 30 0 H L 0 50000 100000 150000 200000 0 50000 100000 150000 200000 Iteration Iteration 40 15 4 30 10 L 1 3 1 Ω - 5 s + L

1 20 1

m 2 H - 0 Χ s Π V 10 -5 2 1 m H -10 0 0 0 50000 100000 150000 200000 0 50000 100000 150000 200000 -10 Iteration Iteration 6 Velocity -20 20 4 L 1 H L 1 b - Ω 2

s 15

-30 -

m 0 1 H

10 Χ

1 -2 Π K -1500 -1000 -500 0 500 1000 2 5 -4 Julian day number H-2,452,535.7103 L -6 0 50000 100000 150000 200000 0 50000 100000 150000 200000 Iteration Iteration 15 1

L 100000.

1 0.8 - L

s 10 10000 d 0.6 H 2 m e 2 H 0.4 5 P 1000 0.2 0 100 0 0 50000 100000 150000 200000 0 50000 100000 150000 200000 residual -5 Iteration Iteration 50 -10 HcL 6 5 40 L 2 1 Velocity - Ω -15 4 s 30 + m 2

3 H Χ 20 2 Π 2 K -1500 -1000 -500 0 500 1000 2 1 10 Julian day number H-2,452,535.7103 L 0 50000 100000 150000 200000 0 50000 100000 150000 200000 Iteration Iteration

6 14 Figure 1. The new data is shown in panel (a) and the best fitting 4 12 L 2

1 10

Ω 2 - s

- 8 two planet (P = 130 day, P = 1000 day) model versus time is 0

1 2 2 m H Χ 6 -2 Π s 4 shown in (b). Panel (c) shows the residuals. 2 -4 2 -6 0 0 50000 100000 150000 200000 0 50000 100000 150000 200000 Iteration Iteration parameterization, which are essentially uncorrelated in com- 1 1 0.8 0.8 parison to the χi and ωi. 0.6 0.6 1 2 Χ Χ The Gelmen-Rubin statistic is typically used to test for 0.4 0.4 0.2 0.2 convergence of the parameter distributions. In parallel tem- 0 0 0 50000 100000 150000 200000 0 50000 100000 150000 200000 pering MCMC, new widely separated parameter values are Iteration Iteration passed up the line to the β = 1 simulation and are Occa- 6 6 5 5 sionally accepted. Roughly every 100 iterations the β =1 4 4 1 3 2 Ω Ω 3 simulation accepts a swap proposal from its neighboring sim- 2 2 1 1 ulation. If the transition is to a location in parameter space 0 0 0 50000 100000 150000 200000 0 50000 100000 150000 200000 that is very remote from the dominant solution, then the Iteration Iteration β = 1 simulation will have to wait for another swap to return it to the dominant peak region. One example of such a swap- ping operation is shown for the P2 parameter in figure 5. Of Figure 2. MCMC parameter values for 300,000 iteration run. course most swaps are to locations within the equilibrium The upper left panel is a plot of the prior × likelihood, and the distribution of the dominant peak. The final β = 1 simula- upper right panel shows a blow-up of the y-axis after dropping the first 10,000 iterations. tion is thus an average of a very large number of independent β = 1 simulations. What we have done is divide the β =1 iterations into ten equal time intervals and inter compared the ten different essentially independent average distribu- Butler et al. (2006). The values given in the columns labeled tions for each parameter using a Gelmen-Rubin test. For ‘New analysis’ are the median of the marginal probability all of the two planet model parameters the Gelmen-Rubin distribution for the parameter in question and the error bars statistic was 6 1.02. identify the boundaries of the 68.3% credible region. The Figure 6 shows the individual parameter marginal dis- value immediately below in parenthesis is the MAP value, tributions for the dominant solution and Table 2 compares the value at the maximum of the joint posterior probability our two planet orbital parameter values and their errors to distribution. We note that for some of the parameters the the one planet values from a) Tinney et al. (2005) and b) MAP value falls just outside the 68.3% credible region. Fi-

c A Bayesian Kepler Periodogram Detects a Second Planet in HD 208487 7

131 1050 L -54 131 -55 130.5 1000

Like 130.5 -56 950 ´ L L 130 L d d d 130 H H

-57 H 900 1 2 -58 1 129.5 129.5 P P Prior P H -59 850 129 10 -60 129 800 -61 128.5 Log 128.5 750 0 250005000075000100000125000150000 0 250005000075000100000125000150000 Iteration chain HΒ=1.L Iteration 0 0.1 0.2 0.3 0.4 0.5 0 0.2 0.4 0.6 0.8 e e 4 1 2 0.5 2 L

1 50 0.4 -

s 0 20

1 0.3 L L 40 m e

-2 1 1 H 0.2 - 18 - -4 s s V 30 0.1 m m -6 H 16 H 0 1 2 K K 20 0 250005000075000100000125000150000 0 250005000075000100000125000150000 14 Iteration Iteration 10 12 1 24 0 0.1 0.2 0.3 0.4 0.5 0 0.2 0.4 0.6 0.8 L 1 1 22 e1 e2 -

Ω 0.8 s 20 + 6 1 m 1

0.6 H 18 Χ

1 4 Π 16 0.8 K

2 0.4

14 1 2 0.2 12 Ω 0.6 - 1 Χ 0 250005000075000100000125000150000 0 250005000075000100000125000150000 1 0 Χ 0.4 Iteration Iteration Π -2 2

6 1050 -4 0.2 4 1000 1 -6 0 L Ω 2

d 950 1 2 3 4 5 6 0 1 2 3 4 5 6 H - 0 1

2 900 2ΠΧ +Ω Ω

Χ 1 1 1

-2 P Π 850

2 1 -4 6 800 -6 4 0.8 0 250005000075000100000125000150000 0 250005000075000100000125000150000 2 2 Iteration Iteration Ω 0.6 - 2 Χ 2 0 6 Χ 0.4 Π -2 0.8 2 5.5 2 0.2 0.6 Ω -4

+ 5 2 2 e 0.4 -6 0 Χ 4.5

Π 1 2 3 4 5 6 0 1 2 3 4 5 6

0.2 2 4 2ΠΧ2 +Ω2 Ω2 0 0 250005000075000100000125000150000 0 250005000075000100000125000150000 Iteration Iteration 6 Figure 4. Joint marginals for various pairs of parameter values.

L 40 4 2 1 - Ω 2 s 30 -

m 0 2 H 20 Χ -2 2 Π K 10 2 -4 -6 0 250005000075000100000125000150000 0 250005000075000100000125000150000 Iteration Iteration 1 100000. 6 0.8 50000 L 5 1 - L

s 4 0.6 1 d H Χ

m 3 H 0.4 2 2 s 1 0.2 P 10000 0 0 5000 0 250005000075000100000125000150000 0 250005000075000100000125000150000 Iteration Iteration

1 6 0.8 5 1000 4 0.6

2 1 254380 254390 254400 254410 254420

Χ 3 0.4 Ω 2 Iteration 0.2 1 0 0 0 250005000075000100000125000150000 0 250005000075000100000125000150000 Figure 5. A blow-up of a small range of parameter P2 iterations Iteration Iteration illustrating a transition to a location that is very remote from 6 5 the dominant solution that Occasionally results from the parallel 4

2 tempering swap operation. This allows the algorithm to widely 3 Ω 2 explore the full prior parameter space in search of other significant 1 0 peaks. 0 250005000075000100000125000150000 Iteration

7 MODEL SELECTION Figure 3. Post burn-in MCMC iterations for a window in period To compare the posterior probabilities of the two planet space (125 6 P1 6 135d and 650 6 P1 6 1200d) that isolates the dominant peak. model to the one planet models we need to evaluate the odds ratio, O21 = p(M2|D, I)/p(M1|D, I), the ratio of the posterior probability of model M2 to model M1. Application of Bayes’s theorem leads to,

p(M2|I) p(D|M2,I) p(M2|I) O21 = ≡ B21 (16) p(M1|I) p(D|M1,I) p(M1|I) where the first factor is the prior odds ratio, and the sec- nally, Panel (a) of Figure 7 shows the data, minus the best ond factor is called the Bayes factor. The Bayes factor is fitting P2 orbit, for two cycles of P1 phase. The best fitting the ratio of the marginal (global) likelihoods of the mod- P1 orbit is overlaid. Panel (b) shows the data plotted versus els. The MCMC algorithm produces samples which are in P2 phase with the best fitting P1 orbit removed. proportion to the posterior probability distribution which is

c 8 P. C. Gregory

Table 2. Model parameter estimates.

Parameter Tinney et al. Butler et al. New Parameter New (2005) (2006) analysis analysis

+0.4 +81 P1 (d) 130 ± 1 130.08 ± 0.51 129.8−0.4 P2 (d) 908−94 (129.9) (1001)

−1 ± ± +1.5 −1 +2 K1 (m s )20219.7 3.616.5−1.6 K2 (m s )10−3 (16.4) (12.8)

+.09 +.26 e1 0.32 ± 0.10 0.24 ± 0.16 0.21−.09 e2 0.38−.20 (0.19) (0.61)

+40 +53 ω1 (deg) 126 ± 40 113 123−37 ω2 (deg) 227−48 (99.7) (255)

+.002 +.07 a1 (AU) 0.49 ± 0.04 0.524 ± 0.030 0.492−.001 a2 (AU) 1.80−.08 (0.493) (1.93)

+.03 +.05 M1 sin i (MJ )0.45 ± 0.05 0.420 ± 0.082 0.37−.04 M2 sin i (MJ )0.46−.13 (0.387) (0.48)

+13 +114 Periastron passage 1 11002.8 ± 10 10999 ± 15 11008−13 Periastron passage 2 10605−120 (JD - 2,440,000) (10995) (JD - 2,440,000) (10512)

RMS residuals (m s−1) 7.2 8.2 4.4

Reduced χ2 1.01

fine for parameter estimation but one needs the proportion- Table 3. Fractional error versus β for the results shown in Figure 8. ality constant for estimating the model marginal likelihood. Clyde (2006) recently reviewed the state of techniques for β range Fractional error model selection from a statistics perspective and Ford & −1 44 Gregory (2006) have evaluated the performance of a variety 1.0 − 10 1.55 × 10 −1 −2 6 of marginal likelihood estimators in the extrasolar planet 10 − 10 7.46 × 10 10−2 − 10−3 15.2 context. 10−3 − 10−4 1.17 In this work we will compare the results from three 10−4 − 10−5 0.52 marginal likelihood estimators: a) parallel tempering, b) ra- 10−5 − 10−6 0.35 tio estimator, and c) restricted Monte Carlo. The analysis 10−6 − 10−7 0.16 presented in Section 7.1, 7.2, and 7.3 is based on the old 10−7 − 10−8 0.02 data set (Tinney et al. 2005) and prior #1. These results are summarized in Section 7.4 together with model selec- tion results for the new data set (Butler et al. 2006) using 1 hln[p(D|M ,X,I)]i = X ln[p(D|M ,X,I)] , (18) prior #2. i β n i β t in the interval from β = 0 to 1, from the finite set. For this −8 7.1 Parallel tempering estimator problem we used 34 tempering in the range β =10 to 1.0. Figure 8 shows a plot of hln[p(D|Mi,X,I)]iβ versus β. The The MCMC samples from all (nβ ) simulations can be used inset shows a blow-up of the range β =0.1 to 1.0. to calculate the marginal likelihood of a model according to The relative importance of different decades of β can equation (17) (Gregory 2005a). be judged from Table 3. The second column gives the frac- tional error that would result if this decade of β was not Z ln[p(D|Mi,I)] = dβhln[p(D|Mi,X,I)]iβ, (17) included and thus indicates the sensitivity of the result to that decade. In Ford and Gregory (2006) we constructed a where i =0, 1, 2 corresponds to a zero, one or two planet similar table for the one planet system HD 88133 only we model, and X represent a vector of the model parameters investigated a wider range of β values down to 10−11. The which includes the extra Gaussian noise parameter s.In fractional errors for the β range 10−5 to 10−8 were very words, for each of the nβ parallel simulations, compute the similar to those given in Table 3. The fractional error for expectation value (average) of the natural logarithm of the β =10−11 − 10−8 was only 0.002 and consequently this likelihood for post burn-in MCMC samples. It is necessary range of β can safely be ignored. to use a sufficient number of tempering levels that we can Figure 9 shows a plot of the parallel tempering marginal estimate the above integral by interpolating values of likelihood versus iteration number for two different MCMC

c A Bayesian Kepler Periodogram Detects a Second Planet in HD 208487 9

1 4 30 0.8 3 0.6 20 L PDF PDF

0.4 2 1 -

s 10 0.2 1 m H 128 128.5 129 129.5 130 130.5 131 131.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 P1 HdL e1 0.35 3.5 -10

0.3 3 Velocity 0.25 2.5 -20 0.2 2 HaL PDF PDF 0.15 1.5 -30 0.1 1 0.05 0.5 0 0.5 1 1.5 2 P1 Orbital phase -10 -7.5 -5 -2.5 0 2.5 5 0.2 0.4 0.6 0.8 1 1.2 -1 V Hms L 2ΠΧ1 + Ω1 0.25 0.3 20 0.2 0.25 0.2

0.15 L 1 PDF PDF 0.15 10 0.1 - 0.1 s

0.05 m

0.05 H 0 12 14 16 18 20 22 24 -6 -4 -2 0 2 4 6 -1 K1 Hms L 2ΠΧ1 -Ω1 1.75 -10 0.005 Velocity 1.5 HbL 0.004 1.25 - 0.003 1 20 PDF PDF 0.75 0.002 0.5 0 0.5 1 1.5 2 0.001 0.25 P2 Orbital phase 700 800 900 1000 1100 0.2 0.4 0.6 0.8 P2 HdL e2 2 0.175 0.15 Figure 7. Panel (a) shows the data, with the best fitting P2 1.5 0.125 0.1 orbit subtracted, for two cycles of P1 phase with the best fitting 1 PDF PDF 0.075 P1 orbit overlaid. Panel (b) shows the data plotted versus P2 0.5 0.05 0.025 phase with the best fitting P1 orbit removed. 1 2 3 4 5 6 10 20 30 40 50 60 -1 2ΠΧ2 +Ω2 K2 Hms L 0.35 0.7 0.3 0.6 0 0.25 0.5 0.2 0.4 PDF PDF 0.15 0.3 0.1 0.2 -50000 0.05 0.1 -80

-6 -4 -2 0 2 4 6 2 4 6 8 10 \

L -90 I \

-1 , L

2ΠΧ2 -Ω2 s ms I ,

H L X

, -100 X

4 2 5 -100000 , 2 M

È -110

4 M È

3 D H D

H -120 3 p

2 p PDF PDF ln -130 2 X

-150000 ln 1 X 1 -140

0.4 0.6 0.8 1 1.2 -0.2 0 0.2 0.4 0.6 0.1 0.15 0.2 0.3 0.5 0.7 1 -200000 Χ1 Χ2 Log Β 0.6 0.6 0.5 0.5 0.4 0.4 1. ´ 10-8 1. ´ 10-6 0.0001 0.01 1

PDF 0.3 PDF 0.3 Log Β 0.2 0.2 0.1 0.1

0 1 2 3 4 5 1 2 3 4 5 6 Figure 8. A plot of hln[p(D|Mi,X,I)]iβ versus β. The inset Ω1 Ω2 shows a blow-up of the range β =0.1 to 1.0.

Figure 6. Marginal parameter probability distributions for the Re-arranging the terms and multiplying both sides by h(X) two planet model for the new data set. we obtain p(D|M ,I)p(X|M ,I)h(X)=p(X|M ,I)p(D|M ,X,I)h(X).(20) runs for the two planet model. The solid gray curve shows i i i I the result for the second ratio estimator method which is dis- Integrate both sides over the prior range for X. cussed below. All three curves converge to the same value Z but the ratio estimator converges much more rapidly. The p(D|Mi,I)re p(X|Mi,I)h(X)dX = final parallel tempering marginal likelihood estimates for model M from the two runs were 1.5×10−54 and 3.4×10−54 , Z 2 p(X|Mi,I)p(D|MI,X,I)h(X)dX. (21) yielding an average value of 2.4 × 10−54. The ratio estimator of the marginal likelihood, which we designate by p(D|M ,I) , is given by 7.2 Marginal likelihood ratio estimator i re R p(X|Mi,I)p(D|Mi,X,I)h(X)dX Our second method was introduced by Ford and Gregory p(D|Mi,I)re = . (22) (2006). It makes use of an additional sampling distribution R p(X|Mi,I)h(X)dX h(X). Starting with Bayes theorem To obtain the marginal likelihood ratio estimator, | p(X|Mi,I)p(D|Mi,X,I) p(D Mi,I)re, we approximate the numerator by drawing | 0 p(X Mi,I)= . (19) ˜ 1 ˜ 2 ˜ ns p(D|Mi,I) samples X , X , ···, X from h(X) and approximate the

c 10 P. C. Gregory

of parameter space to be used in the Monte Carlo inte- 3.5 ´ 10-53 gration. RMC integration using 107 samples was repeated

3 ´ 10-53 three times for the two planet model. The results were 1.6 ± 2.6, 1.1 ± 1.2, 2.3 ± 0.6 × 10−54 yielding a weighted -53 2.5 ´ 10 average of 2.1 ± 0.1 × 10−54. The weighted average of RMC -53 −56 likelihood 2 ´ 10 repeats for the one planet model was 2.9 ± 0.1 × 10 .

1.5 ´ 10-53

Marginal 1 ´ 10-53 7.4 Summary of model selection results 5 ´ 10-54 The first three columns of Table 4 summarizes the marginal

250000 500000 750000 1 ´ 106 1.25 ´ 106 1.5 ´ 106 likelihoods and Bayes factors comparing models M0,M1,M2 Iterations for the old data set analyzed using prior #1. For model M0, the marginal likelihood was obtained by numerical integra- Figure 9. A plot of the parallel tempering marginal likelihood tion. For M1, the value and error estimate are based on the versus iteration number (dashed curves) for two different MCMC ratio estimator and RMC methods. For model M2, the two runs for the two planet model. The solid gray curve shows the parallel tempering marginal likelihood estimates differed by result for the ratio estimator method which is discussed in Sec- ∼ tion 7.2. a factor of 2, although their average agreed within 20% with the values obtained by the ratio estimator and RMC methods. Combining the three results yielded a marginal denominator by drawing samples X~ 1, X~ 2, ···, X~ ns from the likelihood good to ∼±10%. The conclusion is that the Bayes β = 1 MCMC post burn-in iterations. The arbitrary func- factor strongly favors a two planet model compared to the tion h(X) was set equal to a multinormal with a covariance other two models. The right hand column of Table 4 gives matrix equal to twice the covariance matrix computed from the P-value for a frequentist statistical hypothesis test of 3 0 5 a sample of the β = 1 MCMC output. We used ns =10 each model separately. At a 95% confidence level the one 4 5 and ns from 10 to 2 × 10 . Some of the samples from a planet model would be rejected but not the two planet multinormal h(X) can have nonphysical parameter values model. Note a two-sided hypothesis test was used because (e.g. K<0). Rejecting all nonphysical samples corresponds both models would have been rejected if the test statistic fell to sampling from a truncated multinormal. The factor re- far enough in either tail region. A value too far in the lower quired to normalize the truncated multinormal is just the tail region would indicate the model was overfitting the data ratio of the total number of samples from the full multinor- for the given noise estimates. Table 5 gives the same infor- mal to the number of physical valid samples. Of course we mation for the new data set analyzed using the improved need to use the same truncated multinormal in the denom- prior #2. The improved data significantly strengthens the inator of equation (22) so the normalization factor cancels. case for the existence of the second planet. From figures 8 and 9 it is clear that the p(D|M2,I)re con- verges much more rapidly than the parallel tempering esti- mator and the parallel tempering estimator, p(D|M2,I)PT, required 34 β simulations instead of one. The final values of 8 DISCUSSION | × −54 p(D Mi,I)re for models M2 and M1 were 2.0 10 and In section 5, we discussed two different strategies to search × −56 3.3 10 . The solid gray curve in Figure 9 shows marginal the orbital frequency parameter space for a multi-planet likelihood ratio estimator versus iteration number for model model: (a) an upper bound on f1 6 f2 6 ··· 6 fn is uti- M2. It lies between the two parallel tempering convergence lized to maintain the identity of the frequencies, and (b) all values. fi are allowed to roam over the entire frequency range and the parameters re-labeled afterwards. In this second case, nothing constrains f − to be less than f so that degener- 7.3 Restricted Monte Carlo marginal likelihood i 1 i ate parameter peaks can occur. For case (b) and an n planet estimate model, the number of possible peaks is n! more than in (a). We can also make use of Monte Carlo integration to evaluate In this work we adopted approach (b) because in repeated the marginal likelihood as given by equation (23). blind frequency (period) searches it always converged on the highest posterior probability peak, in spite of the huge pe- Z p(D|Mi,I)= p(X|Mi,I)p(D|MI,X,I)dX. (23) riod search range. Approach (a) failed to locate the highest peak in some trials and in the cases where it succeeded it Monte Carlo (MC) integration can be very inefficient in required many more iterations. exploring the whole prior parameter range, but once we Apart from their very different relative efficiency to lo- have established the significant regions of parameter space cate the highest probability peak, the two approaches are with the MCMC results, this is no longer the case. The equivalent and equation (12) is the appropriate frequency outer borders of the MCMC marginal parameter distribu- prior for both approaches. As a test of this claim, we ob- −54 tions were used to delineate the boundaries of the volume tained p(D|M2,I)re =1.9 × 10 from an MCMC run us- ing approach (a) and prior #1. This compares closely with the value 2.0 × 10−54 obtained for approach (b). Again from 3 According to Ford and Gregory (2006), the numerator con- an MCMC run using prior #2, we obtained p(D|M2,I)re = −54 −54 verges more rapidly than the denominator. 3.9×10 for (a) and a value of p(D|M2,I)re =3.9×10 A Bayesian Kepler Periodogram Detects a Second Planet in HD 208487 11

Table 4. Marginal likelihoods for old data set (Tinney et al. 2005) and prior #1.

Model Marginal Bayes χ2 Degrees of P-value Likelihood factor freedom

−60 −6 M0 1.60 × 10 (76 ± 7) × 10 −56 M1 (3.10 ± 0.3) × 10 1.0 41.1 24 0.028 −54 M2 (2.2 ± 0.2) × 10 (71 ± 9) 11.51 19 0.19

Table 5. Marginal likelihoods for new data set (Butler et al. 2006) and prior #2.

Model Marginal Bayes χ2 Degrees of P-value Likelihood factor freedom

−69 −6 M0 9.34 × 10 (24 ± 2) × 10 −64 M1 (4.08 ± 0.39) × 10 1.0 65.0 28 0.00009 −62 M2 (9.54 ± 0.87) × 10 234 ± 31 23.3 23 0.45 for (b). Further tests of this claim were carried out in a en- the results of the improved Kepler periodogram and the new tirely different problem involving the detection of three spec- data set of Butler et al. (2006). tral lines. For this problem the marginal likelihoods com- We also derived the form of joint frequency prior to use puted using approaches (a) and (b) agreed to within 1%. for a multiple planet search which is given by It is interesting to compare the performance of the three n! marginal likelihood estimators employed in this work to their p(f1,f2, ···fn|Mn,I)= (24) f f ···f (ln f − ln f )n performance on another data set for HD 88133, which in- 1 2 n H L volved fitting a one planet model (Ford and Gregory 2006). where n is the number of planets. There are two frequency In the latter study, 5 separate (nβ = 32) parallel tempering search strategies: (a) constrain the frequencies such that runs were carried out. After 300,000 iterations, four esti- fL

9 CONCLUSIONS In this paper we demonstrate the capabilities of an auto- 10 BIBLIOGRAPHY mated Bayesian parallel tempering MCMC approach to the REFERENCES analysis of precision radial velocities. The method is called a Bayesian Kepler periodogram because it is ideally suited Bretthorst, G. L., in Bayesian Inference and Maxi- for detecting signals that are consistent with Kepler’s laws. mum Entropy Methods in Science and Engineering, (ed.) However, it is more than a periodogram because it also pro- C.W.Williams, American Institute of Physics Conference vides full marginal posterior distributions for all the orbital Proceedings, 659, pp 3-22, 2003 parameters that can be extracted from radial velocity data. Cumming, A. 2004, MNRAS, 354, 1165 A preliminary re-analysis of the old data for HD 208487 Clyde, M., in “Statistical Challenges in Modern Astron- found evidence for a second planet (Gregory 2005c). This omy IV,” G. J. Babu and E. D. Feigelson (eds.), San Fran- has now been confirmed by the current analysis based on cisco:Astron. Soc. Pacific (in press 2006)

c 12 P. C. Gregory

Eggenberger, A., Udry, S., & Mayor, M. 2004, Astronomy & Astrophysics 417, 353 Ford, E. B. 2005, Astronomical Journal, 129, 1706 Ford, E. B. 2006, Astrophysical Journal, 620, 481 Ford, E. B., & Gregory, P. C., in “Statistical Challenges in Modern Astronomy IV,” G. J. Babu and E. D. Feigelson (eds.), San Francisco:Astron. Soc. Pacific (in press 2006) astro-ph 0608328 Gregory, P. C. 2005a, ‘Bayesian Logical Data Analysis for the Physical Sciences: A Comparative approach with Math- ematica Support‘, Cambridge University Press Gregory, P. C., Astrophysical Journal, Ap. J., 631, pp. 1198- 1214, (2005b) Gregory, P. C. 2005c, in “Bayesian Inference and Maximum Entropy Methods in Science and Engineering”, San Jose, eds. A. E. Abbas, R. D. Morris, J. P.Castle, AIP Conference Proceedings, 803, pp. 139-145 Roberts, G. O., Gelman, A. and Gilks, W. R. 1997, Annals of Applied Probability, 7, 110-120 Tinney, C. G., Butler, R. P., Marcy, G. W., Jones, H. R. A., Penny, A. J., McCarthy, C., Carter, B. D., & Bond, J. 2003, Astrophysical Journal, 587, 423 Tinney, C. G., Butler, R. P., Marcy, G. W., Jones, H. R. A., Penny, A. J., McCarthy, C., Carter, B. D., & Fischer, D. A. 2005, Astrophysical Journal, 623, 1171 Wright, J. T. 2005, Publication of the Astronomical Soc. of the Pacific, 117, 657 Wolszcan, A., & Frail, D. 1992, Nature,355,145

This paper has been typeset from a TEX/ LATEX file prepared by the author.

c