Final Program Report: Astrostatistics

1. Introduction

The principal purpose of the SAMSI program on Astrostatistics is to identify promising research paths for statistical sciences and applied in problems of observational astronomy, and particle , and to initiate research on these problems. Astrostatistics is a growing field of collaborative researchers who specialize in identifying and developing statistical methods for astronomical research. The problems in astronomy are unique in some way and require new methodologies to analyze massive data streaming from large and federated sky surveys. The program was organized in collaboration with the Center for Astrostatistics (CASt) (http://astrostatistics.psu.edu) at Penn State. Indeed, the recent Statistical Challenges in Modern Astronomy IV conference, held in June 2006, was the closing workshop for the SAMSI Astrostatistics program.

2. Working Group Activities

2.1 Personnel and Program Organization

Jogesh Babu (CASt, Penn state) who was in residence at SAMSI during January -May 2006 led the program. Four Working Groups were formed, whose principal functions, as in all SAMSI programs, were to organize the research and ensure communication. In addition, two Intensive Research Sessions were organized. These groups functioned for the entire Spring Semester of 2006. The majority of participants were from outside the Triangle area. All the groups had representation from at least two different fields. Some of these working groups shared some common statistical issues and the groups interacted closely in such cases. Exoplanets working group led by Bill Jefferys (Universities of Texas and Vermont) and Merlise Clyde () included: Jogesh Babu (Pennsylvania State University, Department of Statistics), Susie Bayarri (University of Valencia, Department of Statistics and Operations Research), Jim Berger (Duke University, ISDS), Floyd Bullard (Duke University, ISDS), David Chernoff (Cornell University, Department of Astronomy), Pablo de la Cruz (University of Valencia, Observatori Astronomic), Gauri Datta (University of Georgia, Department of Statistics), Peter Driscoll (San Francisco State University), Eric Feigelson (Pennsylvania State University, Department of Astronomy & Astrophysics), Phil C. Gregory (University of British Columbia, Department of Physics and Astronomy), Eric Ford (University of California, Berkeley, Department of Astronomy), Tom Jefferys (Unaffiliated), Michael Last (NISS), Hyunsook Lee (Pennsylvania State University, Department of Statistics), Jaeyong Lee (Seoul National University, Department of Statistics), Tom Loredo (Cornell University, Department of Astronomy), Barbara McArthur (University of Texas at Austin, Department of Astronomy), Raman Narayan (San Francisco State University). Surveys and Population Studies working group led by Tom Loredo (Cornell University) included: Jogesh Babu (Penn State University), Ruth Barrera (National University of Colombia), Brendon Brewer (University of Sydney), Alanna Connors (Eureka Scientific), David Chernoff (Cornell University), Pablo de la Cruz (Universitat de Valencia), Gauri S. Datta (University of Georgia), Matthew Fleenor (University of North Carolina), Martin Hendry (University of Glasgow), Woncheol Jang (Duke University), Kristofer Jennings (Purdue University), Chunglee Kim (Northwestern University), Hyunsook Lee (Penn State University), Kuo-Ping Li (University of North Carolina), Ji Meng Loh (Columbia University, Tom Loredo (Cornell University), Vicent Martinez (Universitat de Valencia), Francisco Vera (NISS), Martin Weinberg (University of Massachusetts, Amherst), Haywood Smith (University of Florida). Source Detection and Feature Detection working group led by David van Dyk (University of California, Irvine) included: Keith Arnaud (NASA, Goddard Space Flight Center), Jim Chiang (GLAST), Alanna Connors (Eureka Scientific), Peter Freeman (CMU), Jiashun Jin (Purdue University), Vinay Kashyap (Smithsonian Astrophysical Observatory), Taeyoung Park (Harvard University), Adam Roy (UC Irvine), Jeff Scargle (NASA, Ames Research Center), Aneta Siemiginowska (Smithsonian Astrophysical Observatory), Alex Young (NASA Goddard Space Flight Center), Yaming Yu (University of California, Irvine), Jogesh Babu (Penn State University), Lingsong Zhang (University of North Carolina), Woncheol Jang (Duke University), Rebecca Willett (Duke University), Eric Feigelson (Penn State), Xiao-Li Meng (Harvard University), Thomas Lee (Colorado State University). Gravitational Lensing working group led by Arlie Petters (Duke) included: Charles Keeton (Rutgers University), Christopher Genevese (CMU), Jogesh Babu (Penn State), Ji Meng Loh (Columbia), Brian Rider (Colorado), Nicholas Robbins (Duke, Grad Student), Francisco Vera (NISS, Postdoc), Liliya Williams (Minnesota), Yaming Yu (U. C. Irvine), Zhengyuan Zhu (UNC). Intensive Session on Statistical Issues in Particle Physics led by Louis Lyons (Oxford, UK) met in March, with heavy emphasis during March 6-16. The members include: Michael Woodroofe (University of Michigan), Kyle Cranmer (Brookhaven Lab), Jim Linnemann (Michigan State), Nancy Reid (University of Toronto), Luc Demortier (Rockefeller University), Joel Heinrich (U. Penn), Giovanni Punzi (Scuola Normale Superiore and INFN), Harrison Prosper (Florida State), Pushpa Bhat (Fermi Lab), Bodhi Sen (University of Michigan), Jogesh Babu (Penn State), John Hartigan (Yale University), Hyunsook Lee (Penn State). Intensive Session on Stellar Evolution led by Bill Jefferys (Universities of Texas and Vermont) met during February 20-23. The members include: Ted von Hippel (University of Texas), Steve DeGennaro (University of Texas), Elizabeth Jeffery (University of Texas), Nathan Stein (University of Texas), David van Dyk (University of California, Irvine), Tom Loredo (Cornell), Theodore Arthur Sande (MIT).

2.2. Achieving Diversity

Three African American researchers were central to the program. Harrison Prosper was a major participant in the Statistical Issues in Particle Physics intensive session, Arlie Petters was the leader of Gravitational Lensing working group, and Don Richards was on the overall program leaders committee. Participation of women was also extensive. Merlise Clyde was the co-leader of the Exoplanets working group. Susie Bayarri and Barbara McArthur (who gave a keynote presentation at the kickoff meeting) also participated in this group. Ruth Stella Barrera Rojas and Alanna Connors – who was also on the overall program leaders committee – participated via teleconference in Surveys and Population studies and Source and Feature Detection working groups. A graduate student Hyunsook Lee participated in all the working groups. In addition, Rebecca Willett, Aneta Siemiginowska, and Ramani Pilla participated in the Source and Feature Detection working group; Nancy Reid and Pushpa Bhat participated in the Statistical Issues in Particle Physics intensive session; and Elizabeth Jeffery participated in the Stellar Evolution intensive session. Two other women, Megan Sosey and Fabrizia Guglielmetti were involved in the planning meeting for the program.

2.3. Research

Each Working Group had regularly scheduled meetings/teleconferences throughout the program period; most groups also scheduled one or more intensive working sessions that brought many members to SAMSI for face-to-face collaboration. Each Working Group, as well as the two separate intensive sessions on particle physics and stellar evolution, developed a detailed research agenda and followed through.

2.4. Exoplanets

The Exoplanets working group began its work with a two week intensive session immediately following the January 2006 opening workshop, with many group members in residence at SAMSI for this period. The main goals for this session were to describe the observing processes and planet orbit models in mathematical detail, and to survey existing analysis methods. The many group meetings held during this session quickly got the statisticians “up to speed” on exoplanet modeling, and also allowed astronomers who had been working separately on these problems to learn about the details of each others’ approaches. Fortuitously, this period coincided with the announcement of the discovery of an exoplanet by gravitational lensing, so this new observational technique was also briefly covered. Eric Ford and Barbara MacArthur played particularly key roles in the early meetings, Ford due to his collaboration with a leading observing team using the radial velocity method (the California/Carnegie collaboration), and MacArthur due to her participation in HST astrometric exoplanet observations (with Bill Jefferys). The intensive session was a great success in establishing a common foundation between statisticians and astronomers in the group. This allowed the subsequent weekly group meetings to immediately focus on important technical issues. The initial meetings focused on model specification (including optimal parameterization), prior selection (establishing a common set of default priors so investigators could better compare results) and MCMC methods for fitting models to radial velocity data. On the last topic, Eric Ford and Phil Gregory made detailed presentations on two very different approaches: Ford described random walk Metropolis samplers, while Gregory described a parallel tempering algorithm, innovative in its use of a feedback control system to tune sampler parameters. MacArthur also described a simpler nonlinear fitting approach. Statisticians helped in improving model parameterizations and MCMC output diagnostics. The presentations enabled Floyd Bullard to quickly write his own exoplanet MCMC algorithm from scratch (based mostly on Ford’s approach). As part of the working group’s effort to create a test bed of problems, Ford obtained actual data sets (from Geoff Marcy) for two systems with strong planet signals, for comparing methods and testing code. A third challenging data set was obtained by Gregory (from Chris Tinney), which produces significantly multimodal fits. These data sets were analyzed independently by Ford, Gregory, Bullard and McArthur, and the results compared. An important result of that effort was heightened realization of the importance of good parameterizations of models to improve MCMC performance, especially for fitting low-eccentricity orbits (where parameter identifiability issues arise). Statisticians helped astronomers greatly with both this issue, and with improving MCMC output diagnostics. Finally, the shared data and default priors enabled identification of bugs in existing code by allowing more detailed comparison of results than had been possible before. The early work just described focused on getting sound solutions to the estimation problem of finding what orbits fit a data set from a system known to host a planet. Most of the remaining effort of the working group was devoted to finding good methods for detection, i.e., for comparing models without a planet to those with one or more planets. In the Bayesian framework that the community has adopted for these problems, detection requires calculation of Bayes factors, i.e., ratios of the marginal likelihoods of competing models (likelihoods averaged over all model parameters). Besides being of interest for detecting planets in a particular system, marginal likelihoods play a key role in combining results from many systems to infer population properties, and in using existing data from a system to adaptively schedule subsequent observations. Loredo presented his work with Chernoff on adaptive scheduling to further motivate these calculations. Indeed, this is a key motivation for the group’s work. Exoplanet observations are resource-intensive; adaptive scheduling promises to significantly improve observing efficiency for both planet detection and orbit estimation. But the calculations so far have proved daunting. Marginal likelihood calculation is essentially a multidimensional integration problem. Group members worked closely together to explore a wide variety of marginal likelihood calculation algorithms, some of them based on existing multidimensional integration (“cubature”) methods, but several being innovative. The methods are described in reports by Ford, Clyde, and Loredo. Most of the existing methods that were studied were not successful; these include the harmonic mean estimator and a weighted variant the group invented (explored by Ford); thermodynamic integration (Gregory); and nested sampling (Bullard and Clyde). Several more innovative methods appear more promising. Several involve modifications of the well-known importance sampling method for Monte Carlo integration. Gregory had success with a region-restricted importance sampler. Ford, Bullard, and Jefferys had some success with Gaussian mixture importance samplers built by subsampling MCMC output; Berger and Clyde helped refine these algorithms. Berger also invented a hybrid ratio estimator (somewhat along the lines of harmonic mean, but more robust); Ford had some success with this method. Two other methods were devised that so far have been tested only with “toy models.” Loredo and Berger devised an adaptive kernel density sampler, borrowing ideas from the multivariate locally adaptive KDE literature. Loredo devised an adaptive cubature method that uses MCMC output points to define simplexes that are used in a multivariate generalization of the trapezoid rule. Most of the group’s work focused on data from systems with a strong exoplanet signal. As the program progressed, there was a growing realization that different methods may be appropriate in different regimes of signal strength; this issue—not appreciated before the SAMSI program—remains to be explored. The group’s marginal likelihood calculations compared zero- planet and one-planet models. Another important open issue is the degree to which the methods scale as the number of planets under consideration increases to two or more, significantly increasing the dimensionality. Members of the group continue to collaborate together on many aspects of this rich family of problems. Their recent work includes both further development of marginal likelihood algorithms, as well as improvements in MCMC algorithms, including use of new, population-based evolutionary MCMC algorithms. The main goal of this ongoing research is to find sound, robust algorithms for observers to use, both in final data analysis, but especially for adaptively planning observations.

2.5 Surveys and Population Studies

The Surveys and Population Studies (SPS) working group planned an intensive working session for March 2006. Their early sessions, beginning immediately after the kickoff workshop, thus focused on identifying a baseline of problems and methods that could be fruitfully explored with later, face-to-face collaboration. Astronomer members made several presentations introducing statisticians to various types of astronomical surveys, and the arcane cornucopia of terminology astronomers use to describe the various biases and distortions survey techniques introduce into the data (via selection criteria and measurement error). Some of the topics covered included size- frequency distributions (“log N – log S” distributions; Loredo presenting), Lutz-Kelker bias in parallax surveys (Haywood Smith), and Malmquist bias in galaxy surveys (Martin Hendry). Statisticians Jang and Babu provided presentations on methods for handling truncation and censoring (i.e., methods from survival analysis). Two prominent, recurring themes arose in these presentations that guided much of the group’s subsequent work: the tension between model-based and design-based analysis methods, and the important role of measurement error in astronomical survey data. A unique aspect of astronomical surveys is the combined presence of truncation (often random) and measurement error (often heteroscedastic). Model-based approaches can readily handle both complications, but astronomers’ techniques rely on overly restrictive models. Design-based approaches handle truncation in a more robust way than the model-based methods, but no such methods known to astronomers can account for measurement error. The March intensive working session brought many group members to SAMSI over a two week period. In addition to regular members, Woodroofe visited SAMSI for the particle physics session, and presented his work on estimating velocity distributions in dwarf galaxies using shape-restricted estimation to the SPS group. This work provides a handle on the amount of dark matter in dwarf galaxies. Presentations and collaborations in the March session addressed finding methods that marry the rigorous error handling of parametric modeling with the robustness of product-limit estimators. We discovered that most existing work on surveys in statistics treats only one of these complications; their simultaneous treatment is a research frontier. Jang, Hendry and Loredo in particular extensively discussed these issues together amidst SPS working group activity. They are currently collaborating on a proposal to be submitted to the NSF astronomy program, to develop new methodology for jointly treating measurement error and random truncation in cosmological surveys, based on ideas from their SAMSI collaboration. One direction they plan to pursue is a combination of Lynden-Bell-Woodroofe nonparametric density estimation with a de-convolving kernel density estimator accounting for measurement error. A second direction is motivated by a discovery by Loredo in the course of preparing SPS presentations. He found that an oft-cited old paper of astronomer Sir Arthur Eddington on measurement error bias includes overlooked results that anticipated some key ideas of shrinkage estimation, a decade or more before their introduction into statistics. Jang, Hendry and Loredo hope to elucidate this implicit connection between shrinkage and measurement error bias in surveys in their proposed research. The March session also considered two additional topics. The first—coincidence assessment— was related to earlier topics by the important role of measurement error. Loredo gave an overview of several recurring astronomical problems that require one to determine whether a newly observed object has a counterpart in a survey; the assessment depends on comparing the measured direction of the object with those of many possible counterparts, all with measurement error—a multiple testing problem. Loredo outlined a Bayesian approach, but with several open issues. Jang and Woodroofe brought insights from frequentist multiple testing research to the problem. An intriguing direction for future research involves combining false discover rate (FDR) control techniques with Bayesian modeling. FDR may be used (with a high rate) to generate a subset of data for a subsequent Bayesian analysis. This both reduces the size of the data set, making the subsequent analysis more computationally tractable, and establishes an objective prior for the subset. The second new topic in the March session was spatial statistics for understanding the galaxy distribution. Pablo de la Cruz worked with Vicent Martinez to provide an extensive overview of the motivating astronomy and current methodology for understanding the structure of the galaxy distribution, both in terms of how the amount of structure depends on scale, and how to measure the topology of the structure. This set the stage for the meetings immediately following the intensive session. In those meetings, Jang and Loh gave detailed presentations on new statistical methods for measuring structure in spatial data. Jang’s presentation covered a technique he developed just prior to the SAMSI program, for accelerating cluster analysis of very large point process data sets by estimating level sets of the underlying density for the process. His starting point is a union-of-balls estimator for the level sets, but Jang has shown how to approximate the estimator with balls centered on regular grid points, rather than on data points. This allows use of FFTs, greatly accelerating level set estimation, allowing application to much larger data sets. His presentation led to a collaboration with Hendry (they had not met prior to the Astrostatistics Program). Hendry helped Jang refine his work for cosmological applications, and the two of them recently submitted a paper on this work for publication. At the first SPS meeting, several astronomer participants raised issues about MCMC and marginal likelihood (ML) algorithms tailored to astronomical survey modeling. As a “bookend,” the final weeks of the program saw investigators returning to this topic, inspired in part by parallel work in the Exoplanets group. Loredo and Chernoff began exploring several new methods for MCMC and marginal likelihood (ML) calculations; their research on them is continuing.

2.6 Source and Feature Detection

Recently launched or soon-to-be launched space-based telescopes that are designed to detect and map ultra-violet, X-ray, and gamma-ray electromagnetic emission are opening a whole new window to study the cosmos. Because the production of high-energy electromagnetic emission requires temperatures of millions of degrees and is an indication of the release of vast quantities of stored energy, these instruments give a completely new perspective on the hot and turbulent regions of the universe. The complexity of the instruments, the complexity of the astronomical sources, and the complexity of the scientific questions leads to a subtle inference problem that requires sophisticated statistical tools. The work of ‘Source and Feature Detection’ group focused on developing tools for detecting and classifying physical structure images and spectra collected with these state-of-the art instruments. Specific accomplishments of this working group include: • Development of a theoretical framework for the detection of very faint astronomical sources. Data typically consists of just a few photons, some of which may originate from background contamination of the data. The framework included terminology and methods for detecting a source and quantifying how bright an undetectable source might be, given the characteristics of the observation. New methods allow us to leverage information obtained from numerous faint sources to learn about the population of sources.

• Object detection in multi-epoch data. With large scale panchromatic synoptic surveys becoming more common, image co-addition is becoming necessary as new observations start to get compared with co-added fiducial sky in real time. The standard co-addition techniques have included straight averages, variance weighted averages, medians etc. A more sophisticated nonlinear response chi- square method is also used when it is known that the data are background noise limited and the point spread function is homogenized in all channels. Babu (statistician), Mahabal, Djorgovski (astronomers), and Williams (computer scientist) collaborated to develop a robust object detection technique capable of detecting faint sources. The analysis at each pixel level, based on Mahalanobis distance, seem to detect those not seen at all epochs that are normally smoothed out in traditional methods.

• A new implementation of a multi-scale fully-Bayesian method for imaging low- count astronomical sources. This method allows for the quantification of uncertainty in the image, the incorporation of high quality radio-wave images of a source to enhance details in the X-ray or gamma-ray image, and a new method for identifying unexpected physical structure in the image.

• The design of new highly structured models tailored to the specific instrumentation and scientific questions of several NASA missions, including RHESSI (solar data), Chandra (X-ray), GLAST (gamma-ray), and EGERT (gamma-ray). New code was completed for detecting narrow spectral lines with low photon counts. Implementation of other methods for the various instruments are at differing levels of completeness.

• Methods that automate feature detection in solar images are under continued development. These methods aim to identify, track, and monitor the evolution of such solar features as flare, plumes, and sunspot groups. Results using statistical image processing are very promising and appear to be able to automate what has required up until now tedious manual labor.

• The working group will host a special session at JSM 2007 (a topic contributed session). There will be five talks on statistical issues in high energy astrophysics and solar imaging (by David van Dyk, Thomas Lee, Vinay Kashyap, James Chiang, and Alex Young).

2.7 Gravitational Lensing

The Gravitational Lensing working group established several ongoing collaborations between statisticians, mathematicians, and astronomers. Arlie Petters (mathematics/physics) and Brian Rider (probability) have been working on the statistics of multiple images in microlensing using the Kac-Rice formula from geometric probability theory. In related work, Arlie Petters and Ji Meng Loh (statistics) are continuing their collaboration formed during the SAMSI program and currently studying the distribution and statistics of the saddle and minima images using properties of the spatial point processes. Arlie Petters and Charles Keeton (astronomy) are continuing their work on the probability distributions of image magnification in microlensing. Liliya Williams (astronomy) and Zhenyuan Zhu (statistics) started their collaboration on dark matter inversion methods using gravitational lensing and semi-parametric spatial mixed effects models during the SAMSI program. The gravitationally lensed images are often used to reconstruct the mass distribution in galaxies and clusters of galaxies. The gravitational lensing approach to mass reconstruction is usually preferred to other methods because it does not rely on the uncertain assumptions about the physical state of the mass. Furthermore, the visible matter–stars and gas–need not be a good tracer of the invisible mass–dark matter–whose clustering properties are the main scientific motivation for cluster mass reconstruction. In the literature, both parametric and non-parametric mass modeling have been used to reconstruct the mass distribution, each with its own advantages and disadvantages. They propose to use a semi-parametric mass model for the reconstruction. The mass distribution is modeled as the sum of two parts: a nonlinear parametric mean structure, and the deviation from the mean structure, which is assumed to be a realization of a Gaussian random field (GRF). They use the restricted maximum likelihood method to estimate the parameters of the mean structure and the covariance function of the GRF from the positions of the lens images, and reconstruct the mass distribution non-parametrically using the best linear unbiased predictor (BLUP). Conditional simulation is used to derive multiple realizations of possible mass distribution under the strong lensing constraints. This methodology has been applied to simulated examples with encouraging preliminary results. They plan to further develop the methodology and algorithms, and apply them to real lens data of clusters of galaxies. One direction they would like to pursue is to relax the Gaussian assumption, and use transformed GRF (TGRF) for modeling the mass distribution. Another potential future work is to model the mean structure using the visible light from the clusters, and use the GRF/TGRF as a model to reconstruct the unobserved dark matter distribution, which need not be proportional to that of the visible light.

2.8 Intensive session on Stellar Evolution

The period Feb 20-27 was an intensive session at SAMSI where those in residence (van Dyk, Jefferys) worked with Steve de Gennaro, Elizabeth Jeffery and Nathan Stein on various problems. These included: Improving MCMC sampling, handling field stars, handling binary stars (a major breakthrough here, since the group thought of a way of doing this that avoids reversible jump or other tricks to sample on spaces of variable ). Nathan Stein has produced an internal technical report that documents the group’s software and algorithms, but it hasn’t been disseminated publicly yet. Over the past year since the mini-workshop, with the involvement of David van Dyk in the project, the group has made significant advances on the project, improving the MCMC sampling, which had been a major impediment. Whereas, a year ago all of the tests have been on artificial data, the group is now having some success in analyzing real data. From the analysis of real data, it is determined that none of the stellar evolution codes adequately model the lower main sequence, and the group is actively working on ways to improve this situation. They have included the effects of heavy metal abundance in the code and are working on including recognizing field stars and distinguishing single stars from binaries. The statistical model is now more detailed and realistic, but is also vulnerable to problems in the underlying stellar evolutionary models. Currently, no sets of isochrones satisfactorily model the lower main sequence. The group is exploring possible solutions.

2.9 Intensive session on Statistical Issues in Particle Physics

The group has grown out of the PHYSTAT series of Conferences on “Statistical problems in Particle Physics, Astrophysics and Cosmology”, and in particular the participation of Babu and Feigelson at the Stanford PHYSTAT meeting. The PHYSTAT05 Organizing Committee also endorsed the need for a series of Workshops focused on specific problems. A very important feature of the Working group was the opportunity for experimental Particle Physicists and Astrophysicists to interact with Statisticians for discussions as well as participate in the more structured talks. This interaction was invaluable for learning new techniques, correcting misconceptions, and for introducing Statisticians to some of the interesting statistical problems in current analyses. The very active intensive session had a series of meetings at SAMSI during the week of March 6th to 10th. The topics focused on include:

• Upper limits in the presence of nuisance parameters. Results and properties of Bayesian and Frequentist approaches to this problem were presented by Heinrich and Punzi respectively. The presentations by statisticians Reid and Woodroofe were very valuable to the physicists in the group.

• Multivariate methods for signal/background separation. Almost every analysis in particle physics involves such a procedure. Prosper discussed very recent results of Bayesian neural networks that showed good behavior with a remarkably small number of training events. He also raised interesting theoretical questions about how to test compatibility between various multi-dimensional distributions, such as those used for training multivariate procedures.

• Goodness of fit with sparse multi-dimensional data; p-values; discovery. With the advent of the new Large Hadron Collider accelerator at CERN in 2007, probably the most crucial question will be assessing the significance of any possible signal for the Higgs boson or for new physics beyond the Standard Model (e.g. super-symmetry, quark and/or lepton substructure, extra ). This is usually assessed in terms of significance p- values for the null hypothesis of the Standard Model. As with upper limits, nuisance parameters cause problems; the possibilities were discussed by Cranmer and Demortier. Further studies are in hand to compare the methods they described, and their properties. It was particularly valuable for Particle Physicists to be exposed to Bayesian methods.

Michael Woodroofe, John Hartigan, Hyunsook Lee and Louis Lyons remained at SAMSI for the rest of March. This enabled an ongoing series of less formal interactions, including with members of other Working Groups, and with members of the Duke Statistics Department. In particular, the question of anomaly detection is common over a wide variety of subjects, ranging from Astrophysics to Medical studies; the inter-disciplinary discussions that are readily possible in the SAMSI environment are particularly valuable. Particle Physicists in the past have tended to develop their own methods for dealing with the statistical analysis of their data. It was especially valuable to have contact with statisticians who have an understanding of practical statistical problems. As well as those in the Particle Physics Working Group, particular new links have been forged with statisticians Tom Banks, Jim Berger, David van Dyk and Robert Wolpert. Discussions with astrophysicists Bill Jefferys and Tom Loredo also were most valuable. The work done at SAMSI served as an important stepping-stone towards a Workshop held at the Banff International Research Station (BIRS) in July 2006 on “Statistical inference Problems in High Energy Physics and Astronomy.” SAMSI participants Lyons, Linnemann, and Reid organized this meeting. The presentations at the SAMSI Intensive session on Statistical Issues in Particle Physics include: Nancy Reid “Modifications to Profile Likelihood”; Nancy Reid “p-value Functions”; Luc Demortier “p-values from A to P ”; Jim Linnemann “False Discovery Rate”; Jim Linnemann “Statistical Software Repository for Particle Physics”; Kyle Cranmer “Discovery in Presence of Nuisance Parameters”; Michael Woodroofe “Nuisance Parameters”; Pushpa Bhat “Multivariate Methods”; Harrison Prosper “Signal/Background Discrimination in Particle Physics”; Giovanni Punzi “Ordering Rules for the Neyman Construction with Nuisance Parameters”; Joel Heinrich “Limits and Nuisance Parameters”; Jim Berger “Bayesian Testing”; John Hartigan “Conditioning”; John Hartigan “Stein’s Paradox”; John Hartigan “Bayesian Priors”; Louis Lyons “p-values in Particle Physics”. The SAMSI meeting resulted in internal notes and improved analyses, more than separate papers on the statistical methods. Thus the benefit of the SAMSI meetings is far greater than indicated by the number of papers. The Large Hadron Collider (LHC) has enormous potential to discover the Higgs boson and physics beyond the standard model. However, the experiments at the LHC are entering a new regime in terms of their data’s volume and complexity. This poses a significant challenge to the statistical analysis of their data, and the SAMSI workshop represented the most promising approaches for the LHC experiments. Since the workshop, Cousins and Tucker have submitted for publication a paper that further demonstrates that the most popular of our previous methods fails to perform adequately for many of the problems of the LHC. They point to several of the methods discussed at SAMSI as more appropriate solutions. Kyle Cranmer presented to the ATLAS collaboration several of the methods discussed at SAMSI for incorporation of systematic errors in new particle searches, and was subsequently appointed one of the ‘experts’ of the ATLAS statistics committee. Jim Linnemann presented the False Discovery Rate technique frequently used in Astrophysics, and it is now being considered as an integral part of one of the strategies in the search for supersymmetry. In Particle Physics it often turns out that searches for hypothesized phenomena often do not find any evidence for the new effect. A historic example of this is the Michelson-Morley attempt to measure the speed of the Earth with respect to the aether. More contemporary examples are the searches for the Higgs boson, Supersymmetric particles, direct observation of dark matter, etc. However, if stringent upper limits can be set on the unobserved phenomena, the null result can perhaps be used to rule out various theories. The topic of how best to set upper limits is thus important, but there is at present no consensus on how this should best be done. At the July 2006 BIRS workshop, after listing the main methods that have been proposed to set upper limits on cross sections in the presence of nuisance parameters, an attempt was made to collectively construct a matrix that listed their properties. This resulted in considerable discussion, after which it became clear that the matrix was only reasonably complete in the column marked coverage, and that only for a single channel. As all the methods had reasonable coverage properties, more information was needed for a prospective user to decide which to use. A collaborative project to supply the necessary information about each method, the Limits Challenge, was therefore initiated. Proponents of each method agreed to supply their resulting intervals for a common set of test cases. This will permit direct comparison of frequentist coverage, interval lengths, and Bayesian credibility. It was decided that 1 channel and 10 channel cases would be investigated. The results will be presented at the 2007 PHYSTAT- LHC Workshop. The work started at SAMSI continues and a Workshop is being organized at CERN on ”Statistical issues for LHC Physics”. The Large Hadron Collider (LHC) is the very high energy colliding protons machine that is due to start operating later this year. It will hopefully make major discoveries of new elementary particles, and so the CERN Workshop will be devoted primarily to issues of quantifying the significance of observed effects, including the influence of nuisance parameters. The planning for the Workshop benefited greatly from the experience of previous Workshops in the PHYSTAT series and from discussions at SAMSI and Banff. More information will be available from the web-site: http://phystatlhc.web.cern.ch/phystat- lhc/index.html

2.10 Graduate Student Involvement

Five graduate students from US institutions and two from abroad actively participated in the Astrostatistics program at SAMSI. In addition, one student participated in the Intensive session on statistical issues in Particle Physics. Floyd Bullard (Duke) is the SAMSI Graduate Fellow associated with the Exoplanets working group. He was an active member of the group, attending each meeting, maintaining the group’s web page, and coding MCMC, importance sampling and other algorithms for model fitting and model selection. His graduate work is focused on activities of the working group. He has given a presentation in a student seminar series at Duke on the search for exoplanets, and at SAMSI as part of the graduate student and post-doc seminar series. He was one of several graduate students involved in the SAMSI Astrostatistics Program in the spring of 2006. He maintained the web page for the Exoplanets Working Group (http://www.samsi.info/200506/astro/workinggroup/exo/) and kept minutes of the weekly meetings. At two or three working group meetings he gave brief presentations of the results of some of his work such as trying to solve a model selection problem using a new technique (integrating over a parameter space using nested sampling). Following up on the SAMSI workshop, he was a research assistant for Merlise Clyde (ISDS, Duke University) during the Fall of 2006, during which time they explored the problem of integrating over a highly multimodal space using nested sampling. He has now begun working on his Ph.D. thesis, that grew out of his participation in the SAMSI program. His thesis topic is “Improving the Efficiency of Scheduling Radial Velocity Measurements for Exoplanet Detection Using Bayes and a Fast Integral Estimator”. Matthew Fleenor was a final year graduate student in the Physics Department, UNC, during the Astrostatistics Program. His thesis research (under Prof. James Rose, an astronomer) concerned studying dynamical and kinematical properties of galaxy clusters via spectroscopic observations of the constituent galaxies. Matt frequently attended SPS working group meetings to learn about open issues and current research on survey analysis methods. He made a special effort to visit SAMSI during the SPS intensive session, e.g., consulting with Martin Hendry. His thesis work was largely completed by the time of the SAMSI program, so the program did not directly impact his thesis work. Matt is now on the faculty in the Physics Department at Roanoke College in Virginia. Hyunsook Lee (Penn State) is a statistics graduate student with an undergraduate background in astronomy. She attended tutorials and the astrostatistics kickoff workshop. During that time, she presented a poster, titled “Convex Hull Peeling: Nonparametric Multivariate Data Analysis.” Some other related results were be presented at Interface 2006 (Detecting Outliers in Multivariate Massive Data by Convex Hull Peeling with Applications), SCMA IV (Nonparametric Approach to Multivariate Massive Data Analysis by Convex Hull Peeling), and JSM2006 (A Nonparametric Approach to Descriptive Measures of Multivariate Massive Data Based on Convex Hull Peeling Depth). After the workshop, she joined various focused working group meetings: exoplanets, source and feature detection, gravitational lensing, particle physics, and survey and population studies. She maintained the websites for the Survey and Population Studies working group, and for the Particle Physics group. She was very helpful in providing Survey and Population Studies working group astronomers with information about the strengths and weaknesses of information criteria for model selection (e.g., AIC vs. BIC), and with information about computational geometry tools. She finished her dissertation and graduated from Penn State in 2006. She was an invaluable assistant for the closing workshop SCMA IV. Feedbacks from her poster presentation at the kick off workshop were reflected in her dissertation and other later presentations. She is in the process of writing papers on model selection with a jackknife method and nonparametric massive data analysis with convex hull peeling. The first topic is of theoretical nature and the latter one focuses on developing algorithms for exploratory data analysis with some supporting theory. Finally, participating in the program as a graduate student led her to find a Postdoc position in Harvard-Smithsonian Center for Astrophysics as the only statistician among 900 researchers. Nicholas Robbins (Duke) maintains the public web-page for the Gravitational Lensing working group. He is in the early stages of his thesis work with Professor Bray. Topics covered in the lensing session may be integrated in his thesis, but it is too early in the semester to say definitively. Lingsong Zhang (UNC) is interested in multivariate outlier detection and functional data analysis using singular value decomposition. He was in charge of maintaining the website for the Source and Feature Detection working group, and he was also an active participant of the discussion. Lingsong had developed visualization tools for functional data, and is currently working on multi-resolution outlier detection methods for detecting outliers in long-range dependent time series, with applications in Internet anomaly detection. He was in the astrostatistics program to look for interesting astronomy applications for which he can apply his visualization tools and outlier detection methods. He is also interested in developing new methodology for challenging astronomy problems. Pablo de la Cruz is working on his Ph.D. thesis under the joint supervision of Vicent Martinez (Astronomy) and Jose Miguel Bernardo (Statistics) at the University of Valencia. Pablo resided at SAMSI throughout most of the astrostatistics program, participating predominantly in the Surveys and Population Studies (SPS) group, but also in the Exoplanets group. Pablo was the youngest student participating in these groups; he was a second-year student at the time. He participated in nearly every Exoplanets and SPS working group meeting. He also interacted extensively with researchers when they visited SAMSI, often scheduling one-on-one meetings to learn about their work and methods. He prepared an extensive presentation on “Statistics for the Large Scale Structure”, providing a survey of work on quantifying 2D and 3D structure in the galaxy distribution, and reporting on work in progress with Martinez. Pablo cites his extensive personal interaction with researchers as the most important and rewarding aspect of his SAMSI participation. His peer students in statistics at Valencia for the most part get assigned research problems by their advisors after their second year. Pablo instead is exploring several possibilities together with Martinez; he credits his SAMSI visit with exposing him to a much wider variety of problems and methods than he would have otherwise known about, allowing him to play a much more active role in developing his thesis program. Also, Pablo spent considerable time at SAMSI exploring statistical computing environments, taking advantage of researchers’ varied experiences in many environments to learn about their strengths and weaknesses. He did calculations in R, C, Mathematica, and Python at SAMSI (he has settled on a combination of R and C for his thesis). He also learned MCMC algorithms and especially the importance of output diagnostics. As a measure of the success of the program, he notes that the closing workshop (SCMA 2006) was the first scientific meeting he has attended where he felt he really understood the majority of the topics being discussed, and felt involved with the research. Brendon Brewer is a student in the Physics Department at the University of Sydney in Australia. His thesis research (under Prof. Geraint Lewis, an astronomer) uses Bayesian methods to address inverse problems in astronomy associated with and asteroseismology data. He was originally invited to participate in the gravitational lens group, but correspondence with Petters indicated that the topics the group was focusing on would not directly address his research interests. However, he was very interested in learning about Bayesian and other methods employed in the SPS and Exoplanets groups. Due to his location in Australia, remote participation was not feasible, so Brendon’s participation was limited to two weeks, when he attended the SPS and Exoplanets intensive research sessions. Brendon was particularly interested in computational techniques for model selection, a topic that arose both in the SPS and Exoplanets groups. Inspired by talks he heard at SAMSI, on his return to Sydney, he pursued research on marginal likelihood methods, changing the approach he had previously taken for his work (he is presently using annealed importance sampling; related methods were pursued at SAMSI, especially by Phil Gregory). Brendon met Martin Hendry via the SPS group, and Martin invited him to the University of Glasgow to give a seminar on his thesis work. Brendon has also become interested in survey issues, particular Malmquist bias (which may play a role in analysis of gravitational lens systems). He discussed approaches to handling Malmquist bias with Loredo, Hendry and Chernoff, and hopes to pursue research on this topic after his thesis is completed. Bodhisattva Sen is a graduate student in the statistics department at the University of Michigan, Ann Arbor. He is working with Michael Woodroofe and Moulinath Banerjee on his dissertation. A portion of his thesis will be on applications of Statistics in High Energy Physics (more specifically, on construction of confidence intervals in presence of nuisance parameters in examples that arise frequently in HEP). Bodhi attended both the opening workshop on Astrostatistics (in January 2006) and the intensive session on statistical issues in Particle Physics (in March 2006). Michael Woodroofe presented a joint work with Bodhi Sen ”On the Unified Method with Nuisance Parameters” in the session on Particle Physics, which has now been submitted for publication in a Statistics journal.

3. Workshops

3.1 Planning meeting

In order to begin focusing on the research topics for the Astrostatistics Program, a planning meeting was held at NASA Ames Center during 14-15, 2005. Thursday, July 14, was devoted primarily to scientific discussion, including learning about the wide variety of research interests of astronomers, physicists and statisticians. Each participant had roughly 30 minutes in which to describe his/her interests or applications, although a significant portion of this time was reserved for questions and discussion. Friday, July 15 was devoted mostly to discussion of the SAMSI program itself, especially discussion of potential participants and the planning of workshops and events for the semester long program. The participants included: Jogesh Babu (Program Leader), James Berger (Director of SAMSI), Peter Bickel (NAC Co-Chair), Floyd Bullard, Merlise Clyde, Alanna Connors, Andrew Connolly, Phil Gregory, Fabrizia Guglielmetti, Bill Jefferys, Tom Loredo, Louis Lyons, Fionn Murtagh, Don Richards, Jeff Scargle, Megan Sosey, David van Dyk, Larry Wasserman.

3.2 Opening workshop

The January 23-25, 2006 opening workshop for the program attracted 67 attendees from diverse fields including, statistics, astronomy, physics and applied mathematics, and met the goal of informing the composition and activities of the Working Groups. Details of the program are at http://www.samsi.info/workshops/2005astro-workshop200601.shtml. All the presentations at the opening workshop are available at the CASt web site http: //astrostatistics.psu.edu/samsi06/index.html\#workshop.

3.3 Education and Outreach

The Astrostatistics Program began with Tutorials from 1/18/2006-1/22/2006, designed to familiarize statisticians with current trends in astronomy and expose astronomers to modern methodologies in statistics and applied mathematics. These were conducted in collaboration with CASt to prepare astronomers and statisticians for the cross-disciplinary presentations at the opening workshop. The three tutorials were:

• Bayesian Astrostatistics (led by Tom Loredo, Cornell University). This three-day session included several lectures and practicum classes, by Tom Loredo, Bill Jefferys (Universities of Texas and Vermont) and Philip Gregory (University of British Columbia), teaching 31 participants the basic theory and practice of Bayesian statistics, using examples from astronomy.

• Nonparametric statistics and Machine Learning for astronomers (Chad Schafer and Larry Wassermann, Carnegie-Mellon University). This two-day tutorial introduced astronomers to modern methods in nonparametric statistics including: kernel regression, local polynomial regression, splines, wavelets, adaptive methods, and density estimation. The tutorial included implementation details in the R language. 24 attendees participated in this. • Astronomy for statisticians (Bill Jefferys of Universities of Texas and Vermont, and Eric Feigelson of Penn State). In this two-day tutorial modern understanding of our universe was reviewed spanning planetary systems, stars, the Milky Way Galaxy, extragalactic astronomy and cosmology. Statistical issues underlying the astronomical studies were emphasized and discussed. 29 attendees participated in this. All the presentations at the tutorials are available on-line at the Center for Astrostatistics web site http://astrostatistics.psu.edu/samsi06/index.html\#Tutorials. In addition to the tutorials, a Seminar Course on Astrostatistics was held during the Program, led by Jogesh Babu.

3.4 Closing Workshop

Statistical Challenges in Modern Astronomy IV (SCMA IV): The fourth in a series of interdisciplinary international research conferences, organized by Babu and Feigelson, served as a closing workshop for SAMSI Astrostatistics program. It was held at Penn State University on June 12-15 2006. The scientific program was divided into seven topical sessions with Invited Speakers in astronomy or statistics accompanied by Commentators from the other discipline. The sessions included: Cosmology; Small-N problems; Astronomical surveys; Periodic variability; Recent developments in statistics; Planetary systems; and concluded with Cross-disciplinary perspectives on Physics by Louis Lyons (Oxford); Statistics by James Berger (Duke); and Astronomy by Ofer Lahav (UC London). 104 researchers participated in the conference out of which 17 were women. The participants included 18 students, all except one are from US institutions. Thirty two participants arrived from 16 foreign countries: United Kingdom, France, Switzerland, Australia, India, Denmark, Italy, Spain, Canada, Japan, Israel, South Africa, Hungary, New Zealand, Netherlands, and Colombia. Graduate students Hyun-Sook Lee and Derek Young were very helpful assistants during the conference. The proceedings of the conference are being edited by Babu and Feigelson. The proceedings will be published by the Astronomical Society of the Pacific.

3.5 BIRS Workshop

The program at SAMSI in March 2006 continued at Banff with a BIRS Worshop on “Statistical Inference Problems in Particle Physics and Astrophysics” in July 2006. This concentrated on three topics: Methods for setting upper limits; assessing statistical significance for new phenomena in the presence of nuisance parameters; and multivariate techniques for separating signal and background. A continuing activity from this meeting is the Banff Challenge. This consisted of participants using a whole variety of techniques for determining upper limits on data provided by Joel Heinrich. He will produce a comparison of the performance of these methods at the PHYSTAT-LHC Workshop in June 2007 (e.g. coverage; interval length; Bayesian credibility; pathologies; etc). Two of the methods are being presented at the Joint Statistics Meeting in August. More information on the BIRS Workshop, including the final report, is available at http: //www.pims.math.ca/birs/birspages.php?task=displayevent&event_id=06w5054

4. External Support

SCMA IV conference is supported in part by NSF grant to the Center for Astrostatistics, NASA grant to G. J. Babu. The organizers also appreciate financial support from Penn State’s Outreach division, and particularly the skilled work of the conference planner John Farris. Loredo, Chernoff, Clyde and Berger are supported by an NSF grant that partially supported their work and Floyd Bullard’s work during the program. This grant also is supporting Bullard’s thesis work, which continues some of the work of the Exoplanets Working Group. A NASA grant with Ted von Hippel (UT, Austin) as PI, entitled, “The Ages and Cooling Physics of White Dwarf Stars from Archival and New HST Observations”, helped support the followup work of the Stellar Evolution group. The funding for the BIRS/SAMSI workshop was mainly provided by BIRS. Kyle Cranmer received funding for the workshop from the Brookhaven Science Associates, which manage Brookhaven National Laboratory.

5. Industrial and Governmental Participation

Because of the nature of the program, there was not industrial involvement. However, there was significant participation in the working groups from government agencies and laboratories such as NASA-Ames, NASA-Goddard, Smithsonian Astrophysical Observatory, Brookhaven National Laboratory and Fermi National Laboratory.

6. Affiliates Participation

There were working group participants from each of the following university affiliates: University of California-Berkeley, Carnegie Mellon University, Duke University, University of Georgia, University of Michigan, University of North Carolina at Chapel Hill, Pennsylvania State University, and Purdue University.

7. Research Highlights

The past year has witnessed some impressive advances in applications of statistical methods to cosmology.

7.1 Other Earths?

Is our solar system special? In particular, are there other Earths in our Galaxy—rocky planets in the habitable regions around sun-like stars? So far over 200 planetary systems in our region of the Milky Way are detected. The vast majority of extrasolar planets (exoplanets) are too small and dim to be seen directly. Exoplanets are infered indirectly by detecting the reflex motion of their host star—the minute “wobble” of its position on the sky due to the changing gravitational tug of a planet as it swings round its orbit. Astronomers and statisticians in the exoplanets working group worked together on developing new statistical methods to extract the complex signals from the observations. The data contain significant noise and are sparse and unevenly spaced in time. This often produces significant uncertainty in the properties of candidate planets, thwarting simple analysis methods. The exoplanets working group adopted Bayesian methods to carefully quantify and express uncertainties in planet properties (e.g., planet mass, and orbit size and ellipticity). The group also worked on development of adaptive methods for scheduling ongoing observations of an exoplanet system, to optimize detection of a planet, or estimation of a detected planet’s properties. The approach uses current, incomplete data from a system to predict its future behavior; Bayesian experimental design uses those predictions to identify the best future observation times. The data sparseness and nonuniform sampling combine with highly nonlinear models to make the calculations challenging even for the most modern methods. The group thus created significant new methodology for Bayesian calculation with modest-dimension nonlinear models, including adaptive and population-based MCMC algorithms, and marginal likelihood estimators based on innovative combination of MCMC output with ideas from importance sampling and locally adaptive multivariate kernel density estimation.

7.2 Quantifying Broad Patterns Across the Sky

Late last century, astronomers found evidence for a “Gamma-Ray Halo” by comparing CGRO/EGRET gamma-ray images of the whole sky to the best available physical models. In the gamma-ray sky, the most prominent sources are not “point-sources” such as pulsars and active black-holes; but broad, irregular swaths of diffuse emission. This gamma-ray signature essentially maps out how highly energetic particles such as cosmic-rays impinge on and illuminate both irregular gas clouds and the lower-energy ambient “photon field”. Good understanding of these, can help in predicting the Galactic diffuse gamma-ray emission. This would probably help in understanding our Galactic cosmic-ray and diffuse gas environment. The challenge is in quantifying local or micro uncertainties in the images. To tackle this challenge, Source and Feature Detection working group, used highly- structured multi-level models (which probabilistically follow the path of photons through one’s telescope), plus Bayesian statistical methods to construct images from the often limited photon- count data. These models include multi-scale mathematical components that encourage structure in the images at different levels of resolution, enabling the study of both macro and micro structures in the astronomical source. The model encourages local smoothness in the constructed images, but unlike many methods, the Bayesian procedures allow the degree of smoothing to be largely determined by the data. The Bayesian framework also allows to combine information from multiple sources. The group developed sophisticated new computational tools tailored to these problems. Although computationally expensive, these tools leverage the highly-structured model to deliver not only the best guess of an astronomical image but also a quantification of the uncertainty in the best guess. The group is developing new highly structured models tailored to the specific instrumentation and scientific questions of several NASA missions, including RHESSI (solar data), Chandra (X-ray), GLAST (gamma-ray), and EGERT (gamma-ray).

7.3 Search for New Phenomena in Particle Physics

The issue of hypothesis testing from a Bayesian perspective was the subject of a lively discussion, guided in part by an informal presentation by Jim Berger at SAMSI during March 2006. Physicist Harrison Prosper realized that the concepts under discussion—Bayes factors, which requires the use of proper priors—could be used in the ongoing search at Fermilab for evidence of the production of single top quarks. The key point was the realization that two well- defined hypotheses were under consideration: the Standard Model with and without single top reactions. Therefore, it was possible to compute a valid Bayes factor without ambiguity and with well-defined priors. Moreover, much of the Bayesian machinery required for the calculation of Bayes factors had already been put in place by the D Single Top Group. On December 8, 2006, D announced it had, for the first time, evidence that such reactions indeed exist. This was probably the first time that an important physics result used a Bayes factor (or rather an approximation to it, called a Bayes ratio) in the optimization of the associated analyses.

8. Summary

The Astrostatistics program has provided a unique opportunity for extensive interaction and collaboration between astronomers and statisticians on a complex and important set of astronomical data analysis problems. There have been very few similar such opportunities in contemporary astronomy. The tutorials and initial contacts together taught statisticians exciting, cutting-edge astronomy, and taught astronomers the latest in statistics including cutting-edge nonparametric and machine learning algorithms, and Bayesian computational technology. The methodological needs have pushed participants in both disciplines into new territory, producing new methods that will prove useful elsewhere in statistics, and specific implementations that are already solving useful astronomical problems (improved exoplanet detection and estimation, X- ray and gamma-ray source detection), with great promise for the future. Collaborations formed during the astrostatistics program continue to flourish.

9. Publications and Technical Reports

• Babu, G. J., Mahabal, A., Williams, R., and Djorgovski, S. G. (2007). “Object detection in multi-epoch data”. To appear in the proceedings of Astronomical Data Analysis IV.

• Clyde, M. A., Berger, J. O., Bullard, F., Ford, E. B., Jefferys, W. H., Luo, R., Paulo, R., and Loredo, T. (2007). “Current Challenges in Bayesian Model Choice”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Connors, Alanna and van Dyk, David A. (2007). “How to Win with Non-Gaussian Data: Poisson Goodness-of-Fit”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Ford, E. B., and Gregory, P. C. (2007). “Bayesian Model Selection and Extrasolar Planet Detection”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Ford, E. B., and Rasio, F.A. (2007). “Origins of Eccentric Extrasolar Planets: Testing the Planet-Planet Scattering Model”. Submitted to ApJ.

• Gregory, P. C. (2007). “A Bayesian Kepler periodogram detects a second planet in HD208487”. Monthly Notices of the Royal Astronomical Society, Volume 374, Issue 4, pp. 1321-1333. (MNRAS Homepage). Publication Date: 02/2007.

• Heinrich, J., and Lyons, L. (2007). “Systematic Errors”. Annual Reviews of Particle and Nuclear Physics, to appear.

• Jang, W., Hendry, M. (2007). “Cluster Analysis of Massive Datasets in Astronomy”. Submitted to Statistics and Computing.

• Jeffery, E. J., von Hippel, T., Jefferys, W. H., Winget, D. E., Stein, N., & DeGennaro, S. (2007). “New Techniques to Determine Ages of Open Clusters Using White Dwarfs”. ApJ, Volume 658, 391.

• Jefferys, W. H. (2007). “Current Challenges in Bayesian Model Choice: Comments”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Lee, H. (2006). “Two Topics: A Jackknife Maximum Likelihood Approach to Statistical Model Selection and a Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications”. Ph.D. Thesis, Penn State University.

• Loredo, T. J. (2007). “Analyzing Data From Astronomical Surveys: Issues and Directions”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Lyons, L. (2007). “A particle physicist’s perspective on Astrophysics”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Maness, H. L.; Marcy, G. W.; Ford, E. B.; Hauschildt, P. H.; Shreve, A. T.; Basri,

• G. B.; Butler, R. P.; Vogt, S. S. (2007). “The M Dwarf GJ 436 and its Neptune-Mass Planet”. Publications of the Astronomical Society of the Pacific, Volume 119, 90-101.

• Park, Taeyoung, van Dyk, David A., and Siemiginowska, Aneta (2007). “A. Fitting Narrow Emission Lines in X-ray Spectra: Computation and Methods”. Under revision for the Astrophysical Journal.

• Roe, B. (2007). “Nuclear Instruments and Methods”. A570 p 159.

• Sen, B., Walker, M., and Woodroofe, M. “On the unified method with nuisance parameters” Michigan preprint, submitted for publication.

• van Dyk, David A., Park, Taeyoung, and Siemiginowska, Aneta (2007). “Fitting Narrow Spectral Lines in High-Energy Astrophysics Using Incompatible Gibbs Samplers”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. D. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• von Hippel, T., Jefferys, W. H., Scott, J., Stein, N., Winget, D. E., DeGennaro, S., Dam, A., & Jeffery, E. (2006), “Inverting Color-Magnitude Diagrams to Access Precise Star Cluster Parameters: A Bayesian Approach”. ApJ, 645, 1436.

9.1 Papers in Progress

• “Upper Limits, Detection Limits, and Confidence Intervals” (David van Dyk, Vinay Kashyap, Aneta Siemiginowska, and Andreas Zezas).

• “Detection and Classification of Sunspots Groups Captured in Magentograms” (Thomas Lee, Alex Young, Vinay Kashyap, and David van Dyk).

• “Statistical Modeling of Sunspot Cycles” (Yaming Yu, Vinay Kashyap, and David van Dyk).

• “Bayesian methods for Exoplanet Radial Velocity Data: The Kepler Periodogram and Evolutionary Markov Chain Monte Carlo” (Loredo, Chernoff, Clyde, Berger and Bullard).

• “Reconstruction of the galaxy cluster mass distribution using gravitational lensing and semi-parametric spatial mixed effects model” (L. Willams, and Z. Zhu).

• “A note on measures of significance in HEP and Astrophysics: some higher order approximations” (Zi Jin, James Linneman and Nancy Reid).

• “Likelihood inference for a problem in particle physics” (A. C. Davison and N. Sartori).

• “A Dempster-Shafer Bayesian solution to the Banff A1 Challenge” To be presented at a meeting in August 2007 (P. Edlefsen).

• “Upper limits for source detection in the three-Poisson model” To be presented at a meeting in August 2007 (P. Baines).

• “P-values: what they are and how to use them” CDF note 8662 (L. Demortier)