<<

NSF Annual Report

2006-2007

Submitted to the National Science Foundation May 1, 2007

1 NSF Annual Progress Report for 2006-2007

As outlined in the NSF-SAMSI Cooperative Agreement DMS-0112069, the following is the Annual Progress Report for the Statistical and Applied Mathematical Sciences Institute (SAMSI), for the period July 1, 2006 – June 30, 2007. Past and future activities of SAMSI are also discussed.

0. Executive Summary

Executive Summary contains A. Outline of SAMSI Activities and Initiatives for Year 5 and the Future ...... 3 B. Financial Overview ...... 7 C. Directorate’s Summary of Challenges and Responses...... 8 D. Synopsis of Research, Human Resource Development, and Education...... 11 E. Recommendations from the Renewal Site Visit Report...... 23 F. Evaluation by the SAMSI Governing Board...... 27 Annual Report Table of Contents...... 31

A. Outline of Activities and Initiatives

1. Fifth Year Programs and Activities Regular Programs • High Dimensional Inference and Random Matrices (Fall 2006) o Opening Workshop and Tutorials (9/17/06-9/20/06) o Bayesian Focus Week, Oct.30 - Nov. 3, 2006 o Large Graphical Models and Random Matrices, Nov. 9-11, 2006 o Workshop on Geometry, Random Matrices and Statistical Inference, Jan. 16-19, 2007 o Transition Workshop (April 10-13, 2007, at AIM) o Joint NCAR and SAMSI workshop: Application of Random Matrices: Theory and Methods, May 7-9, 2007. • Development, Assessment and Utilization of Complex Computer Models (Fall 2006, Spring 2007) o Summer School on the Design and Analysis of Computer Experiments (8/11/06-8/16/06 at Simon Fraser U.) o Opening Workshop (9/10/06-9/14/06) o Joint Engineering and Methodology Subprograms Workshop, Oct. 26-17, 2006 o Joint NCAR and SAMSI workshop: Geophysical Models at NCAR: A Scoping and Synthesis Workshop, Nov. 13-14, 2006 o Biosystems Modeling Workshop, March 5-7, 2007 o SAMSI/MUCM Mid-Program Workshop, April 2-3, 2007 o Terrestrial Mid-Program Workshop, April 4, 2007

3 o Transition Workshop, May 14-16, 2007 o Workshop of the working group on Calibration of Computational Models of Cerebral Blood Flow, May 17, 2007 o Joint NCAR/SAMSI Workshop: Application of Statistics to Numerical Models: New Methods and Case Studies, May 21-24, 2007

Summer Programs • Multiplicity and Reproducibility in Scientific Studies (7/10/06-7/28/06) o Opening Workshop (7/10/06-7/12/06) o Transition Workshop (7/27/06-7/28/06) • Dynamic Treatment Regimes and Multistage Decision-Making (June 18-29, 2007) o Tutorials (6/18/07 – 6/20/07) o Opening Workshop (6/21/07-6/22/07) o Transition Workshop (6/28/07-6/29/07)

Education and Outreach • Summer School on the Design and Analysis of Computer Experiments (8/11/06- 8/16/06 at Simon Fraser U.) • 2-Day Workshop for Undergraduates, focusing on Random Matrices: November 17-18, 2006 • 2-Day Workshop for Undergraduates, focusing on Computer Modeling: March 2-23, 2007 • Interdisciplinary Workshop for Undergraduates (5/21/07-5/25/07) • The Industrial Mathematical and Statistical Modeling Workshop for Graduate Students (July 23-31, 2007) • Graduate Courses at SAMSI o Flowing Granular Materials, Fall 2006 o Random Matrices, Fall 2006 o Environmental Modeling, Spring 2007 o Geometry, Random Matrices and Statistical Inference, Spring 2007

Planning, Hot Topic, Technology Transfer, and Transition Workshops • Planning Meeting for a possible Neuronal Modeling program: February 9, 2007, in Pittsburgh. • Transition Workshop of the Astrostatistics program working group on Statistical Inference Problems in High Energy Physics and Astronomy at BIRS, 7/15/06- 7/20/06

2. Sixth Year Program Schedule

• Summer Program on Geometry and Statistics of Shape Spaces (July 7-13, 2007) o Tutorials (7/8/07-7/9/07) o Opening Workshop (7/10/07-7/12/07) • Risk Analysis, Extreme Events and Decision Theory (Fall 2007, Spring 2008) o Opening Workshop and Tutorials (9/16/07-9/19/07)

4 o Mid program workshops - TBD o Transition Workshop (May, 2008) • Random Media (Fall 2007, Spring 2008) o Opening Workshop and Tutorials (9/23/07-9/26/07) o Mid-program workshops – TBD o Transition Workshop (May, 2008) • Environmental Sensor Networks (Spring 2008) o Planning Workshop (October, 2007) o Opening Workshop (1/14/08-1/17/08) o Mid-program workshops – TBD o Transition workshop - TBD

Education and Outreach • 2-Day Workshop for Undergraduates (11/16/07-11/17/07) • 2-Day Workshop for Undergraduates (2/29/08-3/1/08)) • Infinite Possibilities Workshop: November 2-3, 2007 • Interdisciplinary Workshop for Undergraduates (May 19-23, 2008) • The Industrial Mathematical and Statistical Modeling Workshop for Graduate Students (July 21-29, 2008) • Graduate Courses at SAMSI o Risk Analysis, Extreme Events and Decision Theory: Fall 2007 and Spring 2008 o Random Media: Fall 2007 and Spring 2008 o Environmental Sensor Networks: Fall 2008

Planning and Brainstorming Meetings • Planning Meeting for Neuronal Modeling program: Fall, 2007 • Brainstorming meeting on Nanotechnology: October 2007, at Purdue Univ.

Tentative Programs for 2008-2009 • Sequential Monte Carlo Methods for Scientific Computing • Algebraic Techniques in Statistics and Systems Biology

3. Developments and Initiatives

Fifth-Year Developments • A summer school was tried in advance of the SAMSI Computer Modeling program, and was very successful in jump-starting the research of postdocotoral fellows and graduate students in the program. We are similarly planning a series of educational activities in Fall, 2007, in advance of the Spring 2008 program on Environmental Sensor Networks. • National accessibility to working groups was greatly enhanced through initiatives discussed in section C of the Executive Summary. • An NAC Scientific Advisory Board is being created to provide input into the initiation and development of interdisciplinary SAMSI programs.

5 • The opportunities for external graduate students to participate in SAMSI research programs were greatly enhanced. • Databases are being updated: o Participant and scheduling databases (from IMA) are being installed. • The quarterly newsletter samsi.info was established. • Additional research collaborations with other institutes were initiated to enhance the overall impact of mathematics and statistics, including o Activities with the National Center for Atmospheric Research, relating to both the Random Matrices and Computer Modeling programs, including a joint postdoctoral appointment and three joint workshops. o A transition workshop at BIRS. o A transition workshop at AIM. o A variety of coordinated activities with the Canadian National Program on Complex Data Structures, including the Summer School on Computer Modeling.

Planned Sixth-Year Developments • Brainstorming workshop will be held in areas of research in which there is not a clear path forward for creation of a SAMSI program. Initial areas considered for these workshops are: o Nanotechnology (at in Fall, 2006) o Quantum Computation • Expansion of space, through an addition to the NISS building, is under consideration.

6 C. Directorate’s Summary of Challenges and Responses

SAMSI has been successful in achieving its’ goals. The scientific programs have been of high caliber, and have led to significant new and ongoing research collaborations between, statistics, applied mathematics, and disciplinary sciences. There has been significant human resource development, through the postdoctoral and graduate programs and through involvement of senior researchers in new interdisciplinary areas. Many students across the country have been shown the SAMSI vision through educational outreach programs and courses. We feel that these successes are amply demonstrated throughout the report; some highlights are given in section D of the Executive Summary. This section discusses the challenges that arose in Year 5 and the Directorate’s response to these challenges. Additional issues were raised during the site visit of SAMSI as part of the renewal process; these issues and our response to them are outlined in Section E of the Executive Summary.

National Leadership and Scope: The Third Year review highlighted the need for SAMSI to continue the transition to national leadership of programs, and we have completed the transition. The programs in Year 5 were entirely driven by outside leaders, and all programs in Year 6, and those being considered for Year 7, likewise have primary outside leadership. Of course, these programs will still have significant participation of local scientists; indeed, one of the major strengths of SAMSI is its’ ability to draw to its programs stellar local talent in applied mathematics, statistics, and disciplinary sciences. We realized that it is necessary to have a leading local scientist to coordinate this local talent, so each program is now also assigned a local scientific coordinator.

The Education and Outreach program is also broadening in terms of national scope, in several ways. First, the Education and Outreach Committee is being reconfigured to strengthen the national base. The national committee will be charged with the task of proposing new E&O activities, providing information to potential participants, and disseminating information regarding the SAMSI E&O Program to the national community. We will also be inviting an increasing number of nonlocal faculty to serve as mentors at the week-long undergraduate and graduate workshops. This will provide additional mentoring for participants and provide a conduit for disseminating both the workshop concepts and scientific content back to the respective home institutions.

Program Development: With the move to new national leadership of each new program, we encountered the issue that program leaders were unaware of many of the details of SAMSI operation. We have thus instituted other mechanisms to provide more hands-on guidance; including having directorate and NAC liaisons on each program committee and instituting regular meetings or conference calls between the program leaders, directorate, and support staff; this is now being done for all future programs. The complexity of a SAMSI program is daunting, and we have improved our interactions with program leaders by not burdening them with details before it is necessary. The program is planned in stages, and the program leaders are contacted for key input only when necessary for an upcoming stage. This keeps the job of program leader from being overly burdensome administratively.

8

Program Operation: Working Groups are the heart of SAMSI programs, meeting weekly or bi-weekly during a program to advance their research agenda. The core of most working groups is formed by the long-term visitors to SAMSI, the postdocs, and interested local participants, but many non-locals had expressed interest in participating in the working groups. This year all of the SAMSI working groups have had significant participation by non-local individuals, as the following mechanisms were instituted: • The small classroom (203) was converted into a data- video-conferencing facility. This allowed working groups to partly be based elsewhere; as well as have some participants residing at SAMSI. Indeed, one remote working group node operated in the Fall at Berkeley, as part of the High Dimensional Inference and Random Matrices Program. Using video-conferencing, the Berkeley node of the working group met for two hours every week with the SAMSI node of the working group. • We formally contracted with WebEx, a company that hosts web-based data- conferencing, in a manner compatible with all major computer environments: Windows, Mac, and Unix. Thus remote individuals could – from their offices – connect with weekly working group meetings and be full participants. Nearly 40 individuals utilized this capability on a regular basis to connect with SAMSI working groups.

Summer Programs: One of the challenges we faced was that the community asked for opportunities to participate in SAMSI research during the summer, and not just during the academic year. In answer we initiated intensive summer research programs, the first of which was held last summer on Multiplicity and Reproducibility in Scientific Studies. This proved to be very successful, and we are planning two such summer programs this year.

Program Evaluation: Improving the SAMSI evaluation system is a never-ending challenge. A great deal of information is gathered, including an extensive new annual survey of past SAMSI postdocs and program participants. The information from this and the real-time evaluation schemes is presented in Section B and Appendices E and G. We are still in the process of obtaining a new database system, which will automate maintenance of contact information and tracking of career development of SAMSI Postdocs, new researchers and students. It will also provide immediate access to information about participant composition for workshops, programs and other activities. We acquired the database system used by the IMA, and hired a consultant (the actual designer of the IMA database – he subsequently moved to North Carolina) to modify the system to accommodate differences between SAMSI and IMA. Planned future expansions of the database will include compilations of publications supported by SAMSI or resulting from continuations of SAMSI research.

Human Resources: We continue to focus on the postdoctoral program. The pool of applicants continues a sharp upward trajectory, as SAMSI increases in visibility. This year the applicant pool was 140 candidates, up from 99 last year. Last year was the first year the number of long-term visitors (not counting postdocs or graduate students) was in double digits, and the numbers increased again this

9 year. Indeed, we had 21 visitors in residence at SAMSI for three or more months (up from 12 last year), and numerous shorter term visitors. Of the long-term visitors, 7 were new researchers, many of whom were becoming involved in a new area of interdisciplinary research. This significant increase in participation was partly due to increased recognition of SAMSI, and partly due to a more pro-active role of the directorate: we now specifically contact all individuals that were suggested by program leaders as participants, and further ask these individuals for suggestions (with emphasis on under-represented groups and new researchers). The leads to a widening web of contacts for a program and a greatly increased set of interested individuals. We have developed mechanisms to enable SAMSI Graduate Fellows to be non- local. Indeed, this year we had 6 graduate fellows at SAMSI from non-local institutions, and all were heavily involved in the research programs. Achieving diversity is a never-ending challenge. In addition to the many aspects of diversity involved in our regular Education and Outreach activities, SAMSI co- sponsored the Conference on African-American Mathematical Scientists, with the Department of Mathematics at UNC Chapel Hill, in June, 2006. In November 2007, SAMSI be cosponsoring an Infinite Possibilities Workshop for minority women in mathematics and statistics.

Facilities: The large influx of visitors was handled by utilizing shared offices for all postdocs and visitors. The local faculty fellows, who were major participants in the programs, were accommodated by utilization of a large shared office rented from NISS. Neither of these is desirable, and we are severely space-constrained for next year. This is a major reason NISS is considering a building expansion.

10 D. Synopsis of Developments in Research, Human Resource Development, and Education

In later parts of the report, the extensive developments in research and education that have occurred under SAMSI research programs are discussed in detail. To give a flavor of these developments, we highlight some of the findings here, focusing on those for which primary activity ended during this past year.

1. Research

a) ASTROSTATISTICS: Some impressive advances in development of statistical methods for cosmology arose from this program.

Other Earths? Is our solar system special? In particular, are there other Earths in our Galaxy – rocky planets in the habitable regions around sun-like stars? So far over 200 planetary systems in our region of the Milky Way are detected. The vast majority of extrasolar planets (exoplanets) are too small and dim to be seen directly. Exoplanets are infered indirectly by detecting the reflex motion of their host star – the minute ``wobble'' of its position on the sky due to the changing gravitational tug of a planet as it swings round its orbit. Astronomers and statisticians in the exoplanets working group worked together on developing new statistical methods to extract the complex signals from the observations. The data contain significant noise and are sparse and unevenly spaced in time. This often produces significant uncertainty in the properties of candidate planets, thwarting simple analysis methods. The exoplanets working group adopted Bayesian methods to careful quantify and express uncertainties in planet properties (e.g., planet mass, and orbit size and ellipticity). The group also worked on development of adaptive methods for scheduling ongoing observations of an exoplanet system, to optimize detection of a planet, or estimation of a detected planet's properties. The approach uses current, incomplete data from a system to predict its future behavior; Bayesian experimental design uses those predictions to identify the best future observation times. The data sparseness and nonuniform sampling combine with highly nonlinear models to make the calculations challenging even for the most modern methods. The group thus created significant new methodology for Bayesian calculation with modest-dimension nonlinear models, including adaptive and population-based MCMC algorithms, and marginal likelihood estimators based on innovative combination of MCMC output with ideas from importance sampling and locally adaptive multivariate kernel density estimation.

Quantifying Broad Patterns Across the Sky: Late last century, astronomers found evidence for a “Gamma-Ray Halo” by comparing CGRO/EGRET gamma-ray images of the whole sky to the best available physical models. In the gamma-ray sky, the most prominent sources are not “point-sources” such as pulsars and active black-holes; but broad, irregular swaths of diffuse emission. This gamma-ray signature essentially maps out how highly energetic particles such as cosmic-rays impinge on and illuminate both irregular gas clouds and the lower-energy ambient “photon field”. Good understanding of these, can help in predicting the Galactic diffuse gamma-ray emission. This would

11 probably help in understanding our Galactic cosmic-ray and diffuse gas environment. The challenge is in quantifying local or micro uncertainties in the images. To tackle this challenge, the Source and Feature Detection working group, used highly-structured multi-level models (which probabilistically follow the path of photons through one's telescope), plus Bayesian statistical methods to construct images from the often limited photon-count data. These models include multi-scale mathematical components that encourage structure in the images at different levels of resolution, enabling the study of both macro and micro structures in the astronomical source. The model encourages local smoothness in the constructed images, but unlike many methods, the Bayesian procedures allow the degree of smoothing to be largely determined by the data. The Bayesian framework also allows combination of information from multiple sources. The group developed sophisticated new computational tools tailored to these problems. Although computationally expensive, these tools leverage the highly-structured model to deliver not only the best guess of an astronomical image but also a quantification of the uncertainty in the best guess. The group is developing new highly structured models tailored to the specific instrumentation and scientific questions of several NASA missions, including RHESSI (solar data), Chandra (X-ray), GLAST (gamma-ray), and EGERT (gamma-ray).

Search for New Phenomena in Particle Physics: The issue of hypothesis testing from a Bayesian perspective was the subject of a lively discussion during March 2006. Physicist Harrison Prosper realized that the concepts under discussion – Bayes factors, which requires the use of proper priors – could be used in the ongoing search at Fermilab for evidence of the production of single top quarks. The key point was the realization that two well-defined hypotheses were under consideration: the Standard Model with and without single top reactions. Therefore, it was possible to compute a valid Bayes factor without ambiguity and with well-defined priors. Moreover, much of the Bayesian machinery required for the calculation of Bayes factors had already been put in place by the D« Single Top Group. On December 8, 2006, D« announced it had, for the first time, evidence that such reactions indeed exist. This was probably the first time that an important physics result used a Bayes factor (or rather an approximation to it, called a Bayes ratio) in the optimization of the associated analyses.

b) NATIONAL DEFENSE AND HOMELAND SECURITY: The SAMSI program on National Defense and Homeland Security (NDHS) achieved its goal of identifying and following promising research paths for the statistical sciences, applied mathematics and decision sciences in problems of NDHS. Four Working Groups operated throughout the 2005-06 program year: Agricultural Systems, Anomaly Detection, Data Confidentiality, and Social Networks. All working groups had significant external participation (22 external individuals total), produced publications, and many continued operation past the end of the program. The Agricultural Systems Working Group remained active through March, 2007, producing a paper (recently submitted) entitled “Stochastic and Deterministic Models for Agricultural Production Networks” involving researchers from agriculture, applied mathematics, and statistics and at all levels from senior researcher to graduate student. The paper is an illustration of the many connections arising in a SAMSI working group.

12 The Data Confidentiality Working Group evolved into a continuing working group of the NISS/SAMSI Affiliates Program, and continues to meet weekly. Multiple publications, catalyzed by the SAMSI working group were completed. The Anomaly Detection Working Group Research made significant advances in several areas of research, including Bayesian multiple hypothesis testing for Poisson data and the use of mixing distributions to analyze potential spurious observations. Key advances were made in finding default prior distributions that are computationally implementable and which provide for an automatic ‘correction for multiplicity.’ The work will be broadly applicable to the many anomaly detection problems where the data is discrete. An informal Transition Workshop for the NDHS program was held in Research Triangle Park, NC, in October, 2006, in conjunction with the Army Conference on Applied Statistics (ACAS). Three sessions at the ACAS contained presentations resulting from the NDHS program, informing the nearly 100 attendees about the achievements of the program. c) HIGH DIMENSIONAL INFERENCE AND RANDOM MATRICES: Much of the work of this program will be finalized in the coming months, but preliminary research results of considerable interest are as follows. A number of unanticipated connections arose out of the program. One such was between the Geometric Methods working group and that on Multivariate Distributions. Research from the latter group concerning central limit theorems on the space of positive- definite matrices proved critical in advancing the work of the Geometric Methods group. The working group on Graphical Models made important advances in understanding the connection between regularization and Bayesian methods within graphical models. The Universality working group combined applied mathematicians working in integrable systems with statisticians. Although these viewpoints are very different, the synergy was evident, with the statistical issues associated with random matrices significantly influencing the perspective on integrable systems. The level and importance of the work carried out at SAMSI in random matrices is reflected in the fact that the Annals of Statistics is devoting a special issue to the research emerging from the program. Peter Bickel is a guest associate editor for this issue and is collecting papers this spring. There have been seven submitted so far and two already accepted. An example is a deep paper on the eigenstructure of banded matrices. The volume is expected to contain 10-12 papers. The Random Matrices program represented the first full use of the video facilities at SAMSI. A dual-node meeting was held every Friday during which a facility in Berkeley was fully connected with SAMSI through video, audio and computer hookups. It served both the Universality and Regularization working groups. It was a great success, and greatly extended the scope of the two working groups.

d) DEVELOPMENT, ASSESSMENT AND UTILIZATION OF COMPLEX COMPUTER MODELS: Much of the work of this program will be finalized in the coming months, but preliminary research results of considerable interest are as follows.

13 The working group on Air Quality is investigating the question of calibration of functional outputs. There has not previously been such a calibration of an air quality model, so the methodology being developed will be of considerable significance. The working group on Granular Materials - Engineering Applications has tackled the problem of prediction of extreme lava flows from volcanoes, based on a merging of statistical and applied mathematical methodologies. Available data has been used to develop a novel probabilistic model (based on infinitely divisible distributions) for frequency and volume of volcanic flows. This probabilistic model is then used with a (finite element) computer model of volcanic flow to predict the probability of a catastrophic event at a given location over a specified time period (e.g., 50 years). The analysis requires construction of a novel multi-scale emulator of the computer model, and development of an innovate method of importance sampling to compute the tail probability. This is a collaboration of applied mathematicians, statisticians, engineers, and geophysicists. The working group on Methodology is pursuing a number of exciting research topics. One such is an unexpected connection that has been found between the use of time-dependent parameters in environmental computer models (utilized to account for model error) and stationary stochastic processes of Langevin type studied in a previous SAMSI program (Network Modeling for the Internet); such cross-fertilization between programs is a SAMSI goal. The Methodology research has also resulted in significant developments in the creation of emulators of computer models that are expensive to run. Emulators are approximations to the computer model required for a host of activities, including studying the sensitivity to inputs, assisting in optimization, calibrating unknown parameters of the computer model, and validation of the computer model. One exciting research direction has been the creation of efficient emulators for large observational spaces by using sparse representations for spectral space covariance and/or state-space Gaussian processes. Another promising research direction has been the incorporation of derivative information (often available from computer models) in the construction of the emulator. This not only appears to make the emulator considerably more accurate, but also extends its predictive performance outside the range of the training observations. The extension utilizes the fact that suitable Gaussian processes have derivates that are themselves a Gaussian process with known joint distribution (with the process). These methodological advances are being tested on a variety of real computer models available in the program. The working group on Terrestrial Models is attacking the problem of predicting long term biodiversity, through use of extensive modeling. Preliminary results indicate that the model predictions are quite different from current beliefs in the field. This is a collaboration of statisticians and environmental scientists.

e) MULTIPLICITY AND REPRODUCIBILITY IN SCIENTIFIC STUDIES: As a result of this three week summer program, a major new insight was obtained into dealing with the pervasive problem in many sciences of having to simultaneously perform thousands of hypothesis tests (e.g., in microarray analysis). By dealing with the multiplicities through apriori specifications, and utilizing sophisticated search strategies in model space, entire new classes of problems were opened up to solution.

14 2. Human Resource Development

SAMSI’s impact on human resources is fully discussed in sections I.B and I.C, with impact on diversity highlighted in section I.H. The individual program reports also contain significant insight into human resource development. Here we give a summary of SAMSI’s impact on human resource development. SAMSI’s impact on human resources is reflected on an individual level by the new interdisciplinary research directions and continuing collaborations of program participants at all levels. The impact is also documented by the broad constituencies that participate in SAMSI activities, and the satisfaction expressed at the conclusion of these activities. Few people have left SAMSI doing exactly the same kind of research as when they arrived. Graduate students are often influenced to develop dissertations from the research problems or to incorporate applications they encountered at SAMSI into their doctoral research. Undergraduates are influenced to continue to involve mathematical sciences in their graduate school programs – with some applying to doctoral programs in statistics or applied mathematics. Whether statistician, mathematician or scientist and whether resident at SAMSI for a few days for a workshop, for a longer-term visit, for a semester sabbatical or for a full year, SAMSI participants are immersed in an interdisciplinary world of researchers with heterogeneous expertise, different formulations of research priorities, and challenges that no single viewpoint or discipline can encompass and resolve. The demand for participation in SAMSI events has been rising sharply; in SAMSI’s fourth year, the programs are regularly oversubscribed. The resident SAMSI Community has also grown: in 2006-7 (through April) it includes 32 long-term visitors (a month or more), 20 short-term visitors (a week to a month), and many shorter term visitors. Long-term residence from new researchers included 12 new researcher faculty, 14 postdoctoral fellows and associates, and 21 graduate students (7 were visiting graduate students). During 2006-7, 87 researchers participated remotely as individuals in working groups (31 in Random Matrices, 50 in Complex Computational Models), while over 20 participated in the Random Matrices Program via videoconference as a group based at UC Berkeley-Stanford. The total number of participants in SAMSI events is expected to be 1100—a dramatic increase from the 2002–3 total of 780, and they are widely dispersed geographically. Postdoctoral recruitment is highly successful; drawing from an applicant pool has increased steadily both in number and in quality. For 2007-8, 140 candidates applied for postdoctoral fellowships, a 30% increase from 2005-6. For principal researchers (past the postdoctoral stage of their careers), SAMSI offers a unique opportunity to bring complex research ideas and pertinent data to a common setting in search of collaboration and the potential development of new mathematical / statistical tools. For other mathematicians and statisticians, this is an opportunity to be tutored in the disciplinary science and the underlying conceptualization of the scientific objectives by leaders in their field in order to understand these ideas accurately and to integrate mathematical-statistical thinking. Consequently, senior mathematicians and statisticians at SAMSI open new research directions, initiate research collaborations, re-energize research careers, and find applications that alter their teaching and mentorship. Program leaders and other senior and junior researchers in long-term

15 residence (e.g., a semester or longer) also have responsibilities as mentors and research supervisors for postdoctorals and students, and often teach SAMSI classes offered for graduate credit. For the 23 Postdoctoral Fellows and four Associates coming to SAMSI from 2002 through 2006, the expansion of their career horizons began with their own interdisciplinary origins at 24 institutions with Ph.D. degrees in the mathematical sciences plus five domain sciences. Most of all, their SAMSI experiences gave them new views of the interlacing of statistics and mathematics and the integration of domain science with mathematical science, leading to publications in journals across a wide range of disciplines and to continuing collaborations after leaving SAMSI. Career impact immediately upon leaving SAMSI is evident in the continuing research/academic positions of 21 of 22 SAMSI 2002–5 Postdoctoral Fellows and Associates; of these, 16 continue research begun at SAMSI and 15 continue SAMSI collaborations. Their research output appears in 28 journals in mathematics and statistics plus 15 domain science journals Graduate students who serve as SAMSI research assistants participate in working groups and accrue multiple benefits. For some, the prime benefit is a dissertation topic; for others, it rests in the opportunity to work closely in a high-powered research environment with leaders in the field; and for others still it is the opportunity of broadening the scope to expand their expertise into mathematical/statistical areas Undergraduates who have come to SAMSI from 30 states and from different kinds of academic institutions also take away a sense of excitement at the power of mathematics and statistics when integrated with scientific investigation. By intention, undergraduate workshops involve students from diverse academic backgrounds. Regardless of the diversity of their career plans, most see the SAMSI experience as formative; for many of the undergraduates, the SAMSI experience provides the first solid link between the technical world of mathematics and statistics and real world problems.

Diversity: SAMSI policy is to give attention to diversity issues throughout all activities, especially in the Postdoc selection process and in the organization and operation of Workshops and Programs. SAMSI’s success in maintaining a diverse community began with the first activities and continues. Numbers for 2006-7 include only the academic year activities through March; summer activities especially include students and new researchers. Because of the vigorous attention given to diversity, underrepresented minorities are not underrepresented at SAMSI

16 Underrepresented Groups

0.7

0.6

0.5

0.4 % Female % African-American % Hispanic 0.3 % New Researcher-Students Percent of Participants of Percent 0.2

0.1

0 2002-03 (Total 2003-04 (Total 2004-05 (Total 2005-06 (Total 2006-07 (Total Participants* = Participants* = Participants* = Participants* = through March 780) 703) 817) 924) 2007 = 768) Program Year

Workshop Evaluations: Detailed evaluations of workshops are given in Appendix G. Here are the summary graphs indicating the satisfaction of participants.

Summary of Science at SAMSI Workshops (2002-March 2007)

0.7

0.6

0.5

% Excellent 0.4 % Very Good % Good % Fair 0.3 % Poor

0.2 Percent of Responses (1394 total) (1394 Responses of Percent

0.1

0 2002-03 (4 Events, 2003-04 (12 Events, 2004-05 (13 Events, 2005-06 (19 Events, 2006-07 (16 Events, 132 Responses) 177 Responses) 261 Responses) 419 Responses) 358 Responces) Year

17

Workshops 2006-07 Summary: 13 Events

100%

90%

80%

70%

60% Excellent Very Good 50% Good Fair 40% Poor

30%

Percentage of Presponses (291 Total) (291 of Presponses Percentage 20%

10%

0% Science Staff Facilities Lodging Transport Item

Undergraduate Workshops 2006-07 Summary: 3 Events

100%

90%

80%

70%

Excellent 60% Very Good Good 50% Fair Poor 40%

30%

Persentage of Responses (67 Total) (67 of Persentage Responses 20%

10%

0% Science Staff Facilities Lodging Transport Item

18 National and International Breadth: The following four graphs show that workshop participation is highly geographically diverse nationally, both in terms of participation and in terms of funding. The first three graphs present the cumulative totals over the five years of SAMSI operation.

Geographical Distribution of Funded Workshop Participants

Geographical Distribution of All Workshop Participants

19 Geographical Distribution of Undergraduate Workshop Participants

Geographical Distribution of Undergraduate Participants: 2006-07

20 It is SAMSI’s policy always to attract and support the leading scientists, regardless of nationality; but to otherwise focus resources on domestic participants. The table below shows the nationality status of the participants who received some funding from SAMSI.

US Citizen or Foreign National Foreign National Year TOTAL Permanent Resident Residing in US Not Residing in US

2002-03 209 87 36 332

2003-04 220 90 29 339

2004-05 158 71 21 250

2005-06 217 101 37 355 2006-07 193 122 53 368 (as of 4/4/07)

TOTAL 997 471 176 1644

Percentage of all funded participants (1249) 79.82% 37.71% 14.09%

Broadening the DMS research impact: SAMSI’s national impact also depends on Institutional Diversity and the inclusion of participants whose home institutions are not already heavily supported by NSF Funding through DMS. Such inclusion develops the national research base by significantly increasing the number of individuals that can engage in cutting edge research. The SAMSI record in this regard during 2006-07 is excellent, as shown in the following table (for both funded participants and all participants). The ‘Other’ category primarily includes individuals from other disciplines, governmental agencies or laboratories, and industry.

2006-2007 SAMSI Participation

Funded Participants (to date) Home Institution by DMS Funding Level

Top 50 DMS Funded 51-200 DMS Funded Other

# of Institutions 39 38 67

# of People 154 102 112

% People 41.8% 27.7% 30.4%

% SAMSI Funds 64.5% 21.8% 13.7%

All Participants (to date)

Top 50 DMS Funded 51-200 DMS Funded Other

# of Institutions 41 39 116

# of People 290 149 202

% People 46.2% 23.2% 31.5%

21 3. Education

The impact of SAMSI courses and various components of the SAMSI Education and Outreach program are documented in Section I.E. Part 4 and various program reports. We summarize here specific new initiatives and specific highlights of the program.

(i) A summer school was tried in advance of the SAMSI Computer Modeling program, and was very successful in jump-starting the research of postdocotoral fellows and graduate students in the program. We are similarly planning a series of educational activities in Fall, 2007, in advance of the Spring 2008 program on Environmental Sensor Networks.

(ii) Two outreach workshops were held to expose undergraduate students from programs around the country to topics and research directions associated with the SAMSI Programs on High Dimensional Inference and Random Matrices and on Development, Assessment and Utilization of Complex Computer Models. One goal of these workshops was to illustrate the application and synergy between mathematics and statistics which goes far beyond that which students have seen in coursework. The overall objective was to broaden the perspective of students with regard to both future graduate studies and career choices.

(iii) The one-week SAMSI Workshop for Undergraduates encompassed three highly unique components. • All tutorials and sessions were presented by SAMSI graduate students and postdocs under close supervision of directorate members, members of the Education and Outreach Committee, and local faculty. • The workshop provided students with an intensive introduction to the synergy between applied mathematics and statistics in the context of physical applications. • During one of the sessions, the students were introduced to a variety of experiments and each team collected their own physical data.

(iv) The overall goals of the ten-day Industrial Mathematical and Statistical Modeling Workshop for Graduate Students were twofold: • Expose mathematics and statistics students to current research problems from government laboratories and industry which have deterministic and stochastic components; • Expose students to a team approach to problem solving. For the 2006 workshop, current research problems were presented by scientists from Advertising.com, Bank of America, GlaxoSmithKline, Jet Propulsion Laboratory, Lord Corporation, and MIT Lincoln Laboratory. Each team gave a 30 minute oral presentation summarizing their results on the final day of the workshop and written reports were compiled as the SAMSI Technical Report 2006-6 which can be obtained at http://www.samsi.info/reports/index.shtml.

22 E. Recommendations from the Renewal Site-Visit Report

The report of the Committee that visited SAMSI on October 8-11, 2006 was highly in favor of renewed funding for SAMSI. We consider here each of the recommendations from the report (given in italics) for the improvement of SAMSI.

1. Scientific Mission and Accomplishments

SAMSI should continue to pioneer and refine its “working group” methodology for attacking interdisciplinary research. Indeed, much of this year has been spent assimilating the new technologies for remote participation into the working group model.

SAMSI should cultivate more DOE lab-related affiliates and their technical topics, because it is a missed opportunity for both parties and DOE should readily recognize this as they become better acquainted with SAMSI capabilities. We have extensive interactions with LANL and NCAR, but not with other such labs. Perhaps the composition of the Scientific Advisory Board should emphasize individuals from the labs.

The advertising and promotion of SAMSI activities needs to be strengthened both within the statistical and mathematical communities and in those of the application disciplines. The continuing development of the website will no doubt be of great help in this regard as will the recent appointments of ASA and SIAM representatives on the governing board. See below for more on advertising and promotion.

Broadening the network of links to the application disciplines would expand the potential sources of program topics and also help in the evaluation of the impact of the work of SAMSI on scientific disciplines other than statistics and mathematics. Hoped for results from the Scientific Advisory Board.

2. Participants and Human Resource Development

The undergraduate workshops have brought together students mainly from universities located in the East and the Midwest. We encourage the plan to attract a more national distribution of undergraduate students. The plans outlined in Section C indicate the various steps being taken to address this issue.

While participation by women and under-representative groups is substantial in all programs, more attention needs to be paid to reach out to a broader scientific community of potential applicants for post-doctoral fellows. In particular, post-doctoral fellow applicants and awardees should reflect the composition of recent Ph.D.'s with respect to their ethnicity. Similar comment applies to the identification of SAMSI program organizers.

23 We believe that the ethnicity composition of our post-doctoral fellows does appropriately reflect the community, modulo the constraints under which we make appointments. This recommendation perhaps arose because of the ethnic distribution of our current group of postdocs, which (randomly) happened to be rather skewed.

We strongly urge placement of a link to SAMSI on the front webpage of all local participating departments. This is being pursued.

Better provisioning of post-doctoral fellows with major computing resources (including the NSF supercomputer centers) is appropriate for those not otherwise attached to a university, corporate, or agency sponsor. All postdoctoral fellows are attached to a university.

3. Technical Outreach and Community Impact

SAMSI must do a better job of advertising the various opportunities available there, via their web pages and advertisements in outlets such as SIAM News and AMS Notices, to ensure that the opportunities SAMSI provides are open to all. This includes the opportunity to propose a workshop, to participate in the planning meetings for workshops and programs, and to attend workshops and programs. It also includes opportunities for postdocs, graduate students, undergraduates, and K-12 teachers. During the site visit, we emphasized one of the mechanisms (expanding networking) that we use to identify possible participants. SAMSI actually uses numerous strategies to reach and engage the national community, including annual one-page articles in the newsletters or bulletins of the AMS, ASA, IMS, and SIAM (and occasionally elsewhere, e.g. IEEE); advertisements in the newsletters of ASA, IMS, and SIAM for the postdoctoral program and visiting opportunities; posters of major activities sent to all math and stat/biostat departments (and many other places); regular announcements of all upcoming activities to the organizations in the affiliates program; mass e-mailings of upcoming workshops to relevant departments and organizations nationally (and internationally), including discipline-based mailing lists when appropriate; specific SAMSI sessions at the Joint Statistical Meetings, and often at SIAM meetings, together with outreach efforts with various groups at the meetings; informational receptions at the JSM and Joint Mathematics Meeting; representation at events emphasizing diversity, through which opportunities at SAMSI are presented; contacts through networking during the many visits of the members of the directorate to other conferences, universities and organizations.

The new SAMSI outreach program for K-12 teachers should develop materials that combine mathematics and statistics. Note that both mathematics and statistics are currently taught in US schools at the elementary, middle school, and secondary levels, but mostly as separate topics. We shall try to do so, if the program is funded.

24 SAMSI should facilitate transfer of the courses developed locally to other institutions. For example, some consideration could be given to long term visitors with firm plans to write books or extended review articles based on their courses. At a minimum, course notes should be organized and maintained in perpetuity on SAMSI web pages, perhaps with the assistance of graduate students or postdocs. We would have to provide some support to individuals for doing this, and need to decide if this is feasible based on our funding.

SAMSI should work harder to ensure that the math/stat results developed here are transferred back to the science and engineering communities who helped frame the problems in the first place. Two or three long-term visitors (semester or full year) from outside the mathematical sciences during a semester or year-long program would be beneficial in keeping the focus firmly on useful problems with a major impact. It is also recommended (see next section) that domain science representation be added to the GB and the NAC. We have had some success in having visitors from outside the mathematical sciences, but not consistent success. We hope the Scientific Advisory Board can help identify suitable individuals.

4. Governance and Administration

Domain science representation ought to be considered for the GB, for example, a domain scientist, likely from a Triangle university. We currently have an astronomer (Bruce Carney) and a chemist (John Simon) on the Governing Board.

More domain science representation should be considered for the NAC. Perhaps a couple of SAC members could be rotated through the NAC as voting members every year depending on what year-long programs are currently under consideration. Increasing national (and international) presence and breadth on the advisory boards will go a long way to enhancing the prestige and visibility of the Institute. Apart from this, we think that SAMSI is evolving splendidly and should be encouraged to continue. Indeed we do plan to have two Scientific Advisory Board members present at each NAC meeting. Allowing them to officially vote is certainly reasonable.

5. Success Metrics and Self Evaluation

SAMSI should work with the other DMS institutes to provide common measures of program success. This will be raised with the Institute Directors Committee.

SAMSI should solicit external feedback on completed programs from peers who were not participants. This should include comments from scientists in the domains that created the frame for the program.

25 This is, in part, done at transition workshops, where individuals who did not participate in the program are brought in to comment on the results reported from the program. More formal mechanisms for this will be considered.

6. Facilities

There is a need to relieve the space crunch in SAMSI to make it attractive for short-term visitors and local faculty to be resident for long working days in the SAMSI office conference room areas. Indeed.

Better provisioning for the first-year postdocs and short-term visitors not otherwise attached to a university, corporate, or agency sponsor would make a long-term SAMSI even more attractive. All postdocs and essentially all visitors are so attached, but we will strive to ensure that everyone is covered.

26 F. Evaluation by the SAMSI Governing Board (Bruce Carney, George Casella, Thomas Manteuffel, Vijay Nair, John Simon, Daniel Solomon – Chair)

The Governing Board provides broad oversight for the Institute’s administration, finances, and evaluation, and for relationships among the partnering institutions. As part of the annual evaluation, the Governing Board has elected to address four broad questions. That evaluation follows:

1) What are some outcomes of the synthesis of applied mathematics and statistics?

The synthesis of applied mathematics, statistics and the disciplinary sciences is a central tenet of the SAMSI mission. There are notable examples of this synthesis in specific SAMSI programs, but the extent varies substantially across the full portfolio.

The program on Development, Assessment and Utilization of Complex Computer Models provided a context in which statisticians and applied mathematicians worked closely together, and the mutual influence on each other's thinking was highly productive. One dramatic example that was reported is the activity of the working group on Granular Materials - Engineering Applications which tackled the problem of prediction of extreme lava flows from volcanoes, based on a merging of statistical and applied mathematical methodologies. This involved novel probabilistic modeling of events, and a host of statistical issues arising in the validation and utilization of computer models. This was a collaboration of applied mathematicians, statisticians, engineers, and geophysicists.

The National Defense and Homeland Security program had other examples of this synthesis. Indeed, the Agricultural Systems Working Group, which remained active through March 2007 and is comprised of mathematicians, statisticians and veterinary science faculty and students, introduced a new paradigm for characterizing the richness of mathematical models based on the size of stochastic disturbance terms, producing a paper (recently submitted) entitled “Stochastic and Deterministic Models for Agricultural Production Networks.”

The Universality working group of the program on High Dimensional Inference and Random Matrices combined applied mathematicians working in integrable systems with statisticians. Although these viewpoints are very different, it was reported that the statistical issues associated with random matrices significantly influenced the perspective on integrable systems.

The synergy between statistics, mathematics and domain sciences has had a significant impact on participants and postdocs involved with the programs. For instance, postdoc Elaine Spiller was highly active in the Granular Flow working group mentioned above. Her background was applied mathematics, but she has now become very knowledgeable with respect to the host of statistical issues involved with validating and predicting with computer models.

27 The two 2-day undergraduate workshops associated with High Dimensional Inference and Random Matrices and the Development, Assessment and Utilization of Complex Computer Models programs had students from both statistics and mathematics participating in lectures and interactive activities having both statistical and applied mathematical components. Likewise the weeklong undergraduate workshop focused on topics that have both stochastic and deterministic components, and students from both backgrounds were chosen to attend. These opportunities provided students with an in- depth exposure to the synergy between the two disciplines.

2) Is the impact of SAMSI on science and human resources growing?

Section D of the Executive Summary highlights some of the developments in science and education that have occurred through SAMSI’s programs. Their impact is potentially great on the participants as it offers a redirection of research effort for senior participants and a formative experience for postdocs and other junior scientists.

In addition to the scientific advances discussed in the previous section, SAMSI seems to be having a growing impact on other sciences. For instance, the impact on astronomy and physics of the Astrostatistics program appears to be considerable. The Exoplanets working group developed state of the art methodology for detecting and searching for exoplanets. The Source and Feature Detection working group developed new highly structured models for combining sources of information, and is now tailoring the models to the specific instrumentation and scientific questions of several NASA missions, including RHESSI (solar data), Chandra (X-ray), GLAST (gamma-ray), and EGERT (gamma-ray). The Phystat working group studied the problem of detection of signals in the presence of multiplicities (e.g., when one is essentially conducting thousands of tests simultaneously), and made fundamental advances in formulation and computation relating to the problem. As a side benefit of the interaction between physicists and statisticians, a powerful statistical technique that was heavily discussed in the working group was subsequently brought to the D« Single Top Group. Using this methodology, on December 8, 2006, the D« group announced that it had finally detected the very elusive, massive top quark, the final member of the six-quark family.

The lists of refereed publications associated with SAMSI programs (see Section I.G. of the full report) provide another measure of evidence of impact on the mathematical and disciplinary sciences.

SAMSI continues its strong commitment to the development of human resources in the mathematical sciences. Its impacts are discussed in Sections I.B, I.C and I.H (which highlights diversity) of the full report. Participation by minorities and women in SAMSI programs has remained stable and high (above 30%) since SAMSI's very first programs. The proportion of young researchers (students and new researchers) is also high, this year running around 40%.

28 That postdoctoral fellows at SAMSI are imbued with the SAMSI vision of research is evident in the fact that 21 of 22 SAMSI 2002–5 Postdoctoral Fellows and Associates have research/academic positions; of these, 16 continue research begun at SAMSI and 15 continue SAMSI collaborations. Their research output appears in 28 journals in mathematics and statistics plus 15 domain science journals

3) Is the national recognition and respect for SAMSI growing?

SAMSI programs this year and all future programs have entirely non-local leadership. This years leaders were highly eminent scholars, including Peter Bickel (NAS), Iain Johnstone (NAS), Helene Massam, Douglas Nychka and Craig Tracy for the High Dimensional Inference and Random Matrices Program, and Susie Bayarri, Bruce Pitman, Peter Reichert, Tom Santner, Darren Wilkinson, and Dave Higdon for the program on Development, Assessment and Utilization of Complex Computer Models.

The detailed participant lists for concluded programs provide ample evidence of the national and international draw of SAMSI activities. Other evidence of SAMSI’s reach is in the partnerships with other organizations, including Los Alamos National Laboratory, the National Center for Atmospheric Research, the (Canadian) National Program on Complex Data Structures, SANDIA, and the Center for Astrostatistics at Penn State.

Applications to the postdoctoral program were significantly up again this year – totaling 140 applicants, an increase of 40% over 2005-06 – in spite of the fact that statistics Ph.D. students can typically find immediate tenure track jobs and applied mathematics postdoctoral candidates have many other options. The directorate observed that the top statistics and probability candidates have been hearing of the considerable benefits of going through a SAMSI postdoctoral experience, while the top applied mathematics candidates are being attracted by a growing recognition of the importance of integrating applied mathematics and statistics.

Long-term visitors to SAMSI also seem to be sharply up. In 2006-07 there were 21 visitors in residence for three or more months, up from 12 the previous year. As many as 28 long-term visitors have applied to come next year, with the actual number dependent on the NSF budget for next year. The national recognition and respect is also indicated by the excitement that seems to be resulting from the possibility for individuals who are not resident at SAMSI to participate in SAMSI working groups; 87 researchers participated remotely as individuals in working groups (31 in Random Matrices, 50 in Complex Computational Models), while over 20 participated in the Random Matrices Program via video-conference as a group based at UC Berkeley-Stanford.

This continually increasing national presence of SAMSI is also evidenced by the workshops. There appears to be very strong interest in holding SAMSI workshops in other locations nationally, as reflected in the fact that 8 of this year’s 20 workshops were held at other locations: AIM Research Conference Center, Banff, NCAR (3), Penn State, Pittsburgh and Simon Fraser. Last year, 7 of 20 workshops were held off-site.

29 4) Is the Directorate meeting the needs of an evolving SAMSI?

The directorate model continues to serve SAMSI very well, and transitions in the directorate have gone smoothly. This past year, Alan Karr and Nell Sedransk (statisticians) shared the NISS associate directorship, much to the advantage of SAMSI. Over the next six months, Sedransk will assume the major role, with Karr continuing in an advisory capacity as well as focusing on specific issues such as the building expansion. Associate Director Chris Jones (applied mathematician) was gone for the Spring semester, and Jim Damon served admirably as his stand-in. Ralph Smith (applied mathematician) continued his strong contributions to the directorate. At this point only one-plus of the four original members of the SAMSI directorate (statistician Jim Berger) remains, but the directorate is functioning more effectively and efficiently than ever.

The directorate is successfully managing the transition to complete national programming, through an evolving delineation of the role of the directorate in facilitating development of SAMSI programs. As SAMSI has moved exclusively to national leadership of programs, the need for strong directorate support has become clear. This support for program development includes contacting program participants (ranging from senior visitors to postdocs to faculty releases) that are suggested by the program leaders; explaining key elements of SAMSI programs, such as working groups, to the leaders and helping in their formulation; and, of course, organization and planning of workshops.

We reviewed the recommendations arising from the Renewal Site Visit Report (see Section E of the Executive Summary), and feel that the Directorate is effectively addressing (or has addressed) those issues that can be addressed, subject to budget limitations. We note, in passing, that one of the recommendations was to include domain scientist representation on the Governing Board; indeed, one of us (Bruce Carney) is an astronomer and another (John Simon) is a chemist.

The directorate has also been aggressive in seeking technological solutions to issues, for instance in the continuing integration of technology to enhance working group communication with non-local individuals, and in the planned incorporation of more sophisticated data bases (from the IMA) to enhance the capabilities for planning and evaluation.

The Governing Board Chair and the SAMSI Director have a biweekly telephone conference at which administrative and personnel matters are regularly discussed and issues addressed where they have arisen. There is also excellent cooperation among the partner universities and NISS to ensure that obligations are met and that SAMSI continues to flourish.

30 Table of Contents

0. Executive Summary………………………………… ……………………………….. I. Annual Progress Report ...... A. Program Personnel ...... 1. List of Programs and Organizers ...... 2. Program Core Participants ...... B. Postdoctoral Fellows and Associates ...... 1. Overview of Postdoc Activities and Mentoring Strategies...... 2. 2006-07 Postdoc Activities and Mid-Program Reports ...... 3. Postdoc Reports and Evaluations, with Mentor’s Comments...... 4. Tracking of Previous SAMSI Postdocs ...... C. Graduate Student Participation ...... D. Consulted Individuals...... E. Program Activities...... 1. Development, Assessment and Utilization of Computer Models...... 2. High Dimensional Inference and Random Matrices...... 3. Multiplicity and Reproducibility in Scientific Studies ...... 4. Education and Outreach Program ...... F. Industrial and Governmental Participation ...... G. Publications and Technical Reports...... H. Diversity Efforts...... I. External Support and Affiliates ...... J. Advisory Committees...... K. Income and Expenditures ...... II. Special Report: Program Plan ...... A. Programs for 2007-2008 ...... 1. Risk Analysis, Extreme Events and Decision Theory ...... 2. Random Media...... 3. Environmental Sensor Networks ...... 4. Challenges in Dynamic Treatment and Multistage Decision-Making. 5. The Geometry and Statistics of Shape Spaces...... 6. Brainstorming Meetings...... B. Scientific Themes for Later Years ...... C. Budget for 2007-2008 ...... D. Financial Plan for 2007-2008 ...... Appendix A. Final Program Report: National Defense and Homeland Security...... B. Final Program Report: Astrostatistics ...... C. Workshop Participant Lists ...... D. Workshop Programs and Abstracts...... E. Workshop Evaluations......

3 I. Annual Progress Report

The previous annual progress report was complete in all details only through April, 2006. Hence, we also report activities in Year 4 programs that occurred subsequently and were not itemized in the report. These Year 4 programs were National Defense and Homeland Security and Astrostatistics; their final reports are in Appendices A and B, respectively.

A. Program Personnel

1. Program and Activity Organizers

Program Organizers Program Name Affiliation Field Astrostatistics G. Jogesh Babu (Chair) Penn State U Statistics Jim Berger SAMSI Peter Bickel Berkeley Statistics 2006 SAMSI Program Alanna Connors Eureka Scientific Eric Feigelson Penn State U Astronomy Tom Loredo Cornell U Astronomy Donald Richards Penn State U Statistics Larry Wasserman Carnegie Mellon U Statistics Development, Assessment and Utilization M.J. Bayarri (Chair) U of Valencia Statistics of Complex Computer Models Jim Berger SAMSI Statistics Derek Bingham Simon Fraser U Statistics 2006-07 SAMSI Program David Higdon Los Alamos Nat Lab Statistics Scott Mitchell Sandia Nat Lab Bruce Pitman SUNY-Buffalo Mathematics Peter Reichert EAWAG Thomas Santner Ohio State U Statistics Mary Wheeler U of Texas-Austin Mathematics Darren Wilkinson U of Newcastle Mathematics Carl Bergstrom U of Washington Biology Environmental & Ecological Models Jim Clark Duke U Environment & Earth Sci North Carolina State Subprogram Montse Fuentes U Statistics Doug Nychka NCAR IMAGe Ken Reckhow Duke U Environment & Earth Sci Peter Reichert (Chair) EAWAG Jonathan Rougier U of Bristol Statistics Nell Sedransk NISS and SAMSI Statistics Leonard Smith London Sch of Econ Statistics Uncertainty in Models of Granular Sorin Mitran UNC Mathematics Materials: Sources and Consequenses Luis Perrichi U of Puerto Rico Statistics Subprogram Bruce Pitman (Chair) SUNY-Buffalo Mathematics Ralph Smith SAMSI and NCSU Mathematics

32 Engineering Subprogram Jim Berger SAMSI Statistics Mary Fortier General Motors David Higdon Los Alamos Nat Lab Statistics Scott Mitchell Sandia Nat Lab Angela Patterson General Electric Thomas Santner (Chair) Ohio State U Statistics Laura Swiler Sandia Nat Lab Shih-Chung Tsai General Motors North Carolina State Biological Modeling Subprogram Pierre Gremaud U Mathematics Greg Rempala U of Louisville Mathematics Ralph Smith SAMSI and NCSU Mathematics Darren Wilkinson (Chair) U of Newcastle Mathematics Methodology Subprogram M.J. Bayarri (Chair) U of Valencia Statistics Jim Berger SAMSI Statistics Michael Goldstein U of Durham Mathematical Sciences Anthony O'Hagan Sheffield U Probabilty and Statistics Jerome Sacks NISS Statistics Robert Wolpert Duke U Statistics Henry Wynn London Sch of Econ Statistics High Dimensional Inference and Myles Allen Oxford U Atmospheric Physics Random Matrices Estelle Basor California Polytechnic Mathematics U of California- Peter Bickel Berkeley Statistics 2006-07 SAMSI Program Stanford U Statistics Stanford U Statistics Princeton U Oper Res and Finan Eng Iain Johnstone (Chair) Stanford U Statistics Chris Jones UNC and SAMSI Mathematics Helene Massam York U Math and Stat Ken McLaughlin U of Arizona Mathematics Doug Nychka NCAR IMAGe Neil O'Connell U of Warwick Mathematics Ben Santner Lawrence Livermore North Carolina State Jack Silverstein U Mathematics G.W. Stewart U of Maryland Computer Science Craig Tracy U of California-Davis Mathematics Ofer Zeitouni U of Minnesota Applied Mathematics Multiplicity and Reproducibility in James Berger SAMSI Statistics Scientific Studies Raymond Carroll Texas A&M U Statistics Peter Mueller MD Anderson Cancer Biostatistics U of California- 2006 SAMSI Summer Program Juliet Shaffer Berkeley Statistics Peter Westfall (Chair) Texas Tech U Business Administration Stan Young NISS Statistics North Carolina State Education & Outreach Program Ralph Smith (Chair) U Applied Math Johnny Houston Elizabeth City State Math and CS North Carolina State Rachel Levy U Mathematics

33 J. Blair Lyttle Enloe HS, Raleigh Statistics North Carolina State Negash Medhin U Mathematics Daniel Teague NC Sch Math & Sci Mathematics Wei Feng UNC-Wilmington Math and Stat

Activity Organizers

2005-06 Programs

Program Activity Name(s) Year

Astrostatistics Program Astrostatistics Transition Workshop (in conjunction with Jogesh Babu, Jim Berger, Eric Feigelson, Krzysztof Gorski, 2005-06 Tom Loredo, Vicent Martinez, Larry Wasserman, Michael SCMA VI at Penn State) -- June 12-15, 2006 Woodroofe Astrostatistics Transition Workshop (In conjunction with 2006-07 Peter Bickel, Helene Massam, Mike West PHYSTAT at BIRS) – July 15-20, 2006 Education and Outreach Program

SAMSI-CRSC Undergraduate Workshop -- May 22-26, Karen Chiswell, Cammey Cole, Lesa Denning, Ralph 2005-06 2006 Smith, Kim Weems

12th Annual Conference for African American Idris Assani, Patrick Eberlein, Chris Jones, William 2005-06 Researchers in Mathematical Sciences (CAARMS) -- Massey June 20-23, 2006 2006-07 Programs

Program Activity Name(s) Year

Development, Assessment and Utilization of Complex Computer Models Summer School on the Design and Analysis of Computer M.J. Bayarri, Jim Berger, Derek Bingham, David 2006-07 Experiments (at IRMACS, Simon Fraser U) -- August 11- Higdon, Jerry Sacks, Will Welch 16, 2006 Development, Assessment and Utilization of Complex M.J. Bayarri, Bruce Pitman, Peter Reichert, Tom 2006-07 Computer Models (CompMod) Opening Workshop & Santner, Darren Wilkinson Tutorials -- September 10-13, 2006

CompMod Joint Engineering and Methodology 2006-07 M.J. Bayarri, Tom Santner, Robert Wolpert Subprogram Workshop -- October 26-27, 2006

CompMod Biosystems Modeling Workshop -- March 5-7, 2006-07 Greg Rempala, Ralph Smith, Darren Wilkinson 2007

CompMod Joint SAMSI/MUCM Mid-Program Workshop - M.J. Bayarri, Tony O'Hagan, Peter Reichert, Tom 2006-07 - April 2-3, 2007 Santner

CompMod Terrestrial Mid-Program Workshop -- April 4, 2006-07 Jim Clark 2007

34

High Dimesional Inference and Random Matrices High Dimensional Inference and Random Matrices Peter Bickel, Iain Johnstone, Chris Jones, Helene 2006-07 (HDIRM) Opening Workshop & Tutorials -- September Massam, Doug Nychka, Nell Sedransk, G.W. Stewart, 17-20, 2006 Craig Tracy

Random Matrices Program Bayesian Focus Week -- 2006-07 Peter Bickel, Helene Massam, Mike West October 30-November 3, 2006

Large Graphical Models and Random Matrices Workshop 2006-07 , Helene Massam, Nanny Wermuth -- November 9-11, 2006

Workshop on Geometry, Random Matrices and Statistical Mischa Belkin, James Damon, Feng Liang, Sayan 2006-07 Inference -- January 16-19, 2007 Mukherjee

Summer Program on Multiplicity and Reproducibility

Multiplicity and Reproducibility in Scientific Studies 2006-07 Peter Mueller, Stan Young Opening Workshop -- July 10-12, 2006

Multiplicity and Reproducibility in Scientific Studies 2006-07 Peter Mueller, Stan Young Closing Workshop -- July 27-28, 2006

2006-07 Education and Outreach SAMSI-CRSC Industrial Mathematical & Statistical Alina Chertock, Mansoor Haider, Mette Olufsen, Ralph 2006-07 Modeling Workshop for Graduates -- July 24-August 1, Smith 2005

Undergraduate Two-Day Workshop -- November 17-18, 2006-07 Ralph Smith 2007

2006-07 Undergraduate Two-Day Workshop -- March 2-3, 2007 Ralph Smith

Co-sponsored and Informal Meetings and Workshops

T-O-Y 2007 Workshop on Geophysical Models (at 2006-07 Derek Bingham, Montserrat Fuentes NCAR) -- November 13-14, 2006

Dynamics of Infectious Diseases One-day Working 2006-07 H.T. Banks, Ariel Cintron-Arias Group Meeting -- February 16, 2007

Dynamics of Infectious Diseases One-day Working 2006-07 H.T. Banks, Ariel Cintron-Arias Group Meeting -- March 16, 2007

35 2. Program Core Participants and Targeted Experts

For each of the major programs, the following tables present the key participants for the programs. The participants are categorized and coded as follows:

DL Distinguished Lecturer Program affiliated speaker

FF Faculty Fellow Teaching release from local university

Program affiliated local faculty for which no release time is FA Faculty Associate allocated Student from local university, assigned to a specific program GF Graduate Student Fellow and paid a stipend Graduate Student GA Program-affiliated local student with no stipend Associate

VGF Visiting Graduate Fellow Non-local student, paid only expenses

Non-local researchers (holding PhD 5 years or less) brought in NRV New Researcher Visitor for short intervals for interaction with program participants New Researcher Core Non-local researchers (including fellows) who play a major role NRC Visitor in program activities Program-affiliated individual, paid a stipend in association with PF Postdoctoral Fellow a local university Program-affiliated individual with appointment shorter than 1 PA Postdoctoral Associate year Researcher (holding PhD 6 or more years) brought in for short SV Senior Visitor internals for interaction with program participants Non-local researchers (including fellows) who play a major role SC Senior Core Visitor in program activities Key program participant, visiting for semester or year whose UF University Fellow primary support is cost-shared with a partner university local participants of SAMSI working groups (not fellows, WG Working group Participant visitors or persons otherwise desginated) Remote working group remote participants of SAMSI working groups (not otherwise WGR participant designated)

Development, Assessment and Utilization of Complex Computer Models Program Core Participants and Targeted Experts

Last Name First Name Gender Affiliation Department Status

Banks H.T. M North Carolina State U Mathematics WG

Mechanical and Bartel Don M Cornell U Aerospace WGR Engineering

36 Bautista Dianne F Ohio State U Statistics VGF

Bayarri Susie F U of Valencia Statistics SC

Bengtsson Thomas M Bell Labs WGR

Bondell Howard M North Carolina State U Statistics WG

Virginia Polytechnic Inst Borggaard Jeff M Mathematics WGR and State U U of North Carolina- Bu Sunyoung F Mathematics GF Chapel Hill Virginia Polytechnic Inst Burns John M Mathematics WGR and State U

Chen Tsui-Long M North Carolina State U Statistics GF

Virginia Polytechnic Inst Childers Adam M Mathematics WGR and State U SAMSI and North Carolina Cintron-Arias Ariel M Mathematics PF State U Environment and Clark Jim M Duke U FA Earth Sciences Virginia Polytechnic Inst Cliff Eugene M Engineering WGR and State U Mathematics and Cole Cammey F Meredith College WG Computer Science Environment and Courbaud Benoit M Duke U WG Earth Sciences

Crooks James M SAMSI and Duke U Statistics PF

Cui Tiangang M U of Auckland Mathematics VGF

Davidian Marie F North Carolina State U Statistics WG

Davis Jimena F North Carolina State U Mathematics WG

Dean Angela F Ohio State U Statistics WGR

Dediu Sava M North Carolina State U Mathematics WG

Devault Kristin F North Carolina State U Mathematics GF

37 Organismic and Dietze Mike M Harvard U Evolutionary WGR Biology Statistics and Dinwoodie Ian M Duke U WG Decision Sciences

Draghicescu Dana F Hunter College Statistics SV

Mathematics and Drignei Dorin M Oakland U WGR Statistics

Ernstberger Stacey F North Carolina State U Mathematics WG

Foley Kristen F North Carolina State U Mathematics WG

Fuentes Montserrat F North Carolina State U Statistics FF

National Gattiker James M Southampton U Oceanography WGR Center

Gillespie Daniel M Gillespie Consulting DL

Mathematical Goldstein Michael M Durham U SC Sciences

Gotwalt Chris M SAS WG

Gray Genetha F Sandia National Labs WGR

SAMSI and North Carolina Greenshtein Eitan M Mathematics WG State U

Gremaud Pierre M North Carolina State U Mathematics FA

Grove Sarah F North Carolina State U Mathematics WG

Georgia Institue of Guillas Serge M Mathematics NRC Technology

Han Gang M Ohio State U Statistics VGF

Los Alamos National Higdon David M WGR Laboratory Environmental Protection Holland David M WG Agency Mathematical House Leanna F Durham U WGR Sciences

38 Center for Research in Hu Shuhua F North Carolina State U WG Scientific Computing U of North Carolina- Huang Jingfan M Mathematics FF Chapel Hill Mathematics and Huber Mark M Duke U WG Statistics Industrial and Georgia Institue of Hung Ying F Systems WGR Technology Engineering Virginia Virginia Polytechnic Inst Jarrah Abdul Salam M Bioinformatics WGR and State U Institute U of North Carolina- Joyce Jennifer F Mathematics GF Chapel Hill

Joyner Sarah Lynn F North Carolina State U Mathematics WG

Environmental Protection Kang Daiwen WG Agency

Kao Jason M U of Georgia Statistics WGR

Kaufman Cari F SAMSI Statistics PF

Kepler Grace F North Carolina State U Mathematics WG

Applied Kottas Thanasis M U of California-Santa Cruz Mathematics and WGR Statistics Virginia Virginia Polytechnic Inst Laubenbacher Reinhard M Bioinformatics WGR and State U Institute Applied Lee Herbie M U of California-Santa Cruz Mathematics and WGR Statistics Statistics and Liu Fei F Duke U GF Decision Sciences

Lloyd Alun M North Carolina State U Mathematics WG

Loredo Tom M Cornell U Astronomy WGR

Statistics and Lunagomez Simon M Duke U GF Decision Sciences Mathematics and Ma Chunsheng M Wichita State U WG Statistics

39 Mandal Abhyuday M U of Georgia Statistics WGR

Information Maniyar Dharmesh M Aston U WGR Engineering Martinez- Monica F Sandia National Labs WGR Canales Marine, Earth and Baron Systems and North McHenry John M Atmospheric WG Carolina State U Sciences

McKinley Scott M Duke U Mathematics WG

Environment and McMahon Sean M Duke U WG Earth Sciences

Mitchell Scott M Sandia National Labs WGR

U of North Carolina- Mitran Sorin M Mathematics FF Chapel Hill Los Alamos National Moore Lisa F WGR Laboratory

Morris Max M Iowa State U Statistics WGR

Virginia Virginia Polytechnic Inst Mortveit Henning M Bioinformatics WGR and State U Institute Virginia Polytechnic Inst Newbury Golnar Mathematics WGR and State U Applied Nichols Nancy F U of Reading WGR Mathematics Beth Israel Novak Vera F Harvard U Deaconess SC Medical Center Probability & O'Hagan Tony M U of Sheffield SC Statistics

Olufsen Mette F North Carolina State U Mathematics WG

Paciorek Chris M Harvard U Public Health WGR

Virginia Polytechnic Inst Paredes-Alvarez Betty F Mathematics WGR and State U

Patra Abani M SUNY-Buffalo and NSF Engineering WGR

Patterson Angela F General Electric SC

40 Paulo Rui M Technical U of Lisboa WGR

Pericchi Luis M U of Puerto Rico Statistics SC

London School of Perry Mark M Statistics WGR Economics College of Arts & Pitman Bruce M SUNY-Buffalo SC Sciences

Qian Zhiguang M U of Wisconsin Statistics WGR

Virginia Polytechnic Inst Rautenberg Carlos M Mathematics WGR and State U

Reese Shane M Brigham Young U Statistics WGR

Reich Brian M North Carolina State U Statistics WG

Swiss Federal Inst of Aquatic Sci Reichert Peter M & Techonology SC

Reinman Grant M Pratt and Whitney WGR

Mathematical Rougier Jonathan M Durham U UF Sciences National Institute of Sacks Jerome M SC Statistical Sciences

Sain Steve M NCAR IMAGe WGR

Samuels Johnny M North Carolina State U Mathematics WG

Applied Sanso Bruno M U of California-Santa Cruz Mathematics and WGR Statistics

Santner Tom M Ohio State U Statistics UF

Shearer Michael M North Carolina State U Mathematics WG

Civil and Shoemaker Christine F Cornell U Environmental WGR Engineering

Sitter Randy M Simon Fraser U Statistics WGR

London School of Smith Leonard M Statistics SC Economics

41 Spiller Elaine F SAMSI Mathematics PF

Mathematics and Steinberg David M Tel-Aviv U SC Statistics SAMSI and North Carolina Storlie Curtis M Statistics PA State U Virginia Polytechnic Inst Mechanical Sutton Daniel M WGR and State U Engineering

Sutton Karyn F North Carolina State U Mathematics WG

Environmental Protection Swall Jenise F WG Agency

Swiler Laura F Sandia National Labs WGR

Applied Taddy Matt M U of California-Santa Cruz Mathematics and WGR Statistics

Toman Blaza F NIST WGR

Tsai Shih-Chung General Motors WGR

Virginia Polytechnic Inst Veliz-Cuba Alan M Mathematics WGR and State U Virginia Polytechnic Inst Vera-Licona Paola F Mathematics WGR and State U SAMSI and U of North Vernieres Guillaume M Mathematics PF Carolina-Chapel Hill SAMSI and North Carolina White Gentry M Statistics PA State U Mathematics and Wilkinson Darren M Newcastle U SC Statistics Statistics and Wolpert Robert M Duke U FF Decision Sciences London School of Wynn Henry M Statistics SC Economics Beth Israel Zhao Peng F Harvard U Deaconess NRV Medical Center

Zhao Weija M Qingdao U Mathematics SC

Statistics and U of North Carolina- Zhu Zhengyuan M Operations WG Chapel Hill Research

42 Virginia Polytechnic Inst Zietsman Lizette F Mathematics WGR and State U

High Dimensional Inference and Random Matrices

Program Core Participants and Targeted Experts

Last Name First Name Gender Affiliation Department Status

Ahn Jeongyoun F U of Georgia Statistics WGR

Airoldi Edo M Carnegie Mellon U Computer Science WGR

Belkin Mikhail M Ohio State Computer Science SC

Belov Sergei M Duke U Mathematics GF

Berger James M SAMSI and Duke U Statistics WG

U of California- Bickel Peter M Statistics SV Berkeley U of North Carolina- Cao Hongyan F Statistics GF Chapel Hill Statistics and Carvalho Carlos M Duke U WG Decision Sciences

Choup Leonard M U of California-Davis Mathematics WGR

Cox David M Nuffield College DL

Systems and Davis Ginger F U of Virginia Information WGR Engineering

Dey Dipak M U of Connecticut Statistics WGR

DiCiccio Thomas M Cornell U Social Statistics WGR

Dobra Adrian M U of Washington Statistics WGR

Donoho David M Stanford U Statistics DL

Dumitriu Ioana F U of Washington Mathematics NRV

43 U of California- El Karoui Noureddine M Statistics NRC Berkeley Operations Fan Jianqing M Princeton U Research and SV Financial Eng Operations Fan YingYing F Princeton U Research and VGF Financial Eng Mathematics and Far Reza Rashidi M Queen's U WGR Statistics North Carolina State Fuentes Montserrat F Statistics FA U Statistics and Gao Zhenglei F Duke U GF Decision Sciences

Goetze Friedrich M Bielefeld U Mathematics WGR

Greenshtein Eitan M Purdue U Statistics UF

Matematisk Fysik, Guhr Thomas M Lunds Universitet SV LTH Georgia Institue of Guillas Serge M Mathematics WG Technology Environment and Hegerl Gabriele F Duke U FA Earth Sciences Georgia Institue of Houdre Christian M Mathematics NRC Technology NIBHI, School of Hoyle David M U of Manchester WGR Medicine Mathematics and Huber Mark M Duke U WG Statistics Industrial and Georgia Institute of Huo Xiaoming M Systems WGR Technology Engineering North Carolina State Ipsen Ilse F Mathematics FF U Statistics and Jang Woncheol M Duke U WGR Decision Sciences North Carolina State Jing Naihuan M Mathematics WG U

Johnstone Iain M Stanford U Statistics SV

Kaufman Cari F SAMSI and Duke U Statistics PA

44 Massachusetts Koev Plamen M Institute of Mathematics WGR Technology U of Missouri- Kolenikov Stanislav M Statistics WGR Columbia

Konno Yoshihiko M Japan Women's U Facutly of Science WGR

SAMSI and U of Krishnapur Manjunath M North Carolina- Mathematics PF Chapel Hill

Last Mike M NISS PA

Lee Yoonkyung F Ohio State U Statistics NRC

Lefew William M Duke U Mathematics GF

Levina Liza F U of Michigan Statistics WGR

North Carolina State Li Lexin M Statistics FA U U of Illinois-Urbana Liang Feng F Statistics NRC Champaign

Lin Xiaodong M U of Cincinnati Mathematics WGR

Georgia Institue of Litherland Trevis M Mathematics VGF Technology North Carolina State Liu Peng M Statistics WG U U of North Carolina- Liu Yufeng M Statistics FF Chapel Hill

Lv Jinchi M Princeton U Mathematics VGF

Mathematics and Ma Chunsheng M Wichita State U UF Statistics Mathematics and Massam Helene F York U SC Statistics

Miller Peter M U of Michigan Mathematics UF

Mathematics and Mingo James A. M Queen's U SV Statistics Statistics and Mukherjee Sayan M Duke U FF Decision Sciences

45 Computer Science Weizmann Institute Nadler Boaz M and Applied NRV of Science Mathematics

Nychka Doug M NCAR IMAGe WGR

Mathematical Oraby Tamer M U of Cincinnati WGR Sciences Statistics and Pal Jayanta M SAMSI and Duke U PF Decision Sciences U of Maryland- Park Junyong M Mathematics WGR Baltimore County

Paul Debhashis M U of California-Davis Statistics NRC

Peng Jie F U of California-Davis Statistics WGR

Pourahmadi Mohsen M Northern Illinois U Statistics WGR

U of North Carolina- Qiao Xingye M Statistics GF Chapel Hill Statistics and Rajaratnam Bala M SAMSI and Duke U PF Decision Sciences Massachusetts Electrical Rao Raj M Institute of Engineering and NRV Technology Computer Science

Rempala Greg M U of Louisville Mathematics SC

Pennsylvania State Richards Don M Statistics SC U

Rider Brian M U of Colorado Mathematics WGR

Ya'acov Hebrew U of Ritov M Statistics WGR (Jacob) Jerusalem

Rumanov Igor M U of California-Davis Mathematics WGR

Schoolfield Clyde M U of Florida Statistics WGR

Schwartzman Armin M Stanford U Statistics WGR

North Carolina State Selee Teresa F Mathematics GF U North Carolina State Sharma Dhruv M Statistics GF U

46 U of North Carolina- Shen Haipeng M Statistics FA Chapel Hill North Carolina State Silverstein Jack M Mathematics FF U Massachusetts Smith Steven M Institute of Lincoln Laboratory WGR Technology

Spiller Elaine F SAMSI and Duke U Mathematics PA

North Carolina State Stefanski Len M Statistics FF U U of Missouri- Sun Dongchu M Statistics WGR Columbia CUNY-Hunter Mathematics and Talih Makram M NRC College Statistics

Tracy Craig M U of California-Davis Mathematics SV

U of North Carolina- Truong Young M Biostatistics FF Chapel Hill

Venakides Stephanos M Duke U Mathematics FF

Chalmers U of Mathematical Wermuth Nan F SC Technology Sciences Operations Wu Yichao M Princeton U Research and WGR Financial Eng Georgia Institue of Xu Hua M Mathematics VGF Technology U of California- Yu Bin F Statistics SV Berkeley Industrial and Georgia Institue of Yuan Ming M Systems NRC Technology Engineering

Zeitouni Ofer M Minnesota U Mathematics SC

Statistics and Zhang Liang M Duke U WG Decision Sciences U of North Carolina- Zhao Yufan M Biostatistics GF Chapel Hill

Zhou Xin M Duke U Mathematics WG

Zhu Ji M U of Michigan Statistics WGR

47 Statistics and U of North Carolina- Zhu Zhengyuan M Operations WG Chapel Hill Research

Summer Program on Multiplicity and Reproducibility in Scientific Studies Program Core Participants and Targeted Experts

Last Name First Name Gender Affiliation Department Status

Arani Ramin M Bristol-Myers Squibb WGR

Bayarri Susie F U of Valencia Statistics SV

Berger Jim M SAMSI and Duke U Statistics WG

German Diabetes Biometrics and Dickhaus Thorsten M NRC Center Epidemiology German Diabetes Biometrics and Finner Helmut M SC Center Epidemiology U of Texas M.D. Biostatistics and Guindani Michele M Anderson Cancer WGR Applied Mathematics Center Statistics and Hoff Peter M U of Washington SV Biostatistics Food and Drug Devices and Irony Tebla F WGR Administration Radiological Health Statistics and Jang Woncheol M Duke U PA Decision Sciences U of Texas M.D. Biostatistics and Johnson Valen M Anderson Cancer SV Applied Mathematics Center Medical College of Laud Prakash M SV Wisconsin CIIT Centers for Liu Delong M WG Health Research

Markatou Marianthi F Columbia U Biostatistics WGR

U of Texas M.D. Mueller Peter M Anderson Cancer Biostatistics SC Center Outcomes Research, Obenchain Robert L. (Bob) M Eli Lilly and Company SC US Medical Ottawa Health Clinical Epidemiology O'Rourke Keith M SV Research Institute Program

48 Rice Kenneth M U of Washington Biostatistics NRC

Rodriguez Abel M Duke U Statistics WG

U of Texas M.D. Rosner Gary M Anderson Cancer Biostatistics WGR Center

Sarkar Sanat M Temple U Statistics SV

Scott James M Duke U Statistics WG

U of California- Shaffer Juliet F Statistics SC Berkeley

Siroky David M Duke U Political Science WG

Sivaganesan Siva M U of Cincinnati SC

Public Health Sun Lei F U of Toronto Sciences and NRC Statistics Neurology; Georgetown U School Tractenberg Rochelle F Biostatistics; NRC of Medicine Psychiatry National Institutes of Environmental Health Umbach David M WGR Health Sciences Statistics and Yekutieli Daniel M Tel Aviv U NRV Operations Research National Institute of Young Stan M FA Statistical Sciences

Zhou Huibin (Harry) M Yale U Statistics SV

49 B. Postdoctoral Fellows

This section includes the postdoctoral fellow selection and mentoring processes at SAMSI and synopses of the activities of the 2005-6 and 2006-7 SAMSI Postdocs from their own perspectives with commentaries by their mentors. Section B.1 describes the SAMSI activities and strategies for effective selection and mentoring; Section B.2 contains the mid-program reports by SAMSI Postdocs; Section B.3 contains the activity reports for SAMSI Postdocs during this grant year; and Section B.4 tracks previous Postdocs and follow-up evaluations.

SAMSI 2006-7 Postdocs and Postdoctoral Associates and their mentors are presented below.

Ariel Cintron-Arias (Ph.D., Applied Mathematics, ) SAMSI Programs: Development, Assessment and Utilization of Complex Computer Models: - Dynamics of Infectious Disease - Methodology Research Mentor: H.T. Banks and Peter Reichert Administrative Mentor: Ralph Smith

James Crooks (Ph.D., Physics, University of North Carolina, Chapel Hill) SAMSI Programs: Development, Assessment and Utilization of Complex Computer Models: - Terrestrial Models - Dynamics of Infectious Disease - Methodology Astrostatistics: - Exoplanets Research Mentor: Jim Clark Administrative Mentor: Jim Berger

Cari Kaufman (Ph.D., Statistics, Carnegie Mellon University) SAMSI Programs: Development, Assessment and Utilization of Complex Computer Models: - Climate and Weather - Methodology - Terrestrial Models High Dimensional and Random Matrices: - Climate and Weather - Regularization and Covariance Research Mentor: Jonathan Rougier Administrative Mentor: Jim Berger

Manjunath Krishnapur (Ph.D., Probability Theory, University of California - Berkeley) SAMSI Programs: High-Dimensional Inference and Random Matrices: - Universality in Random Matrix Theory Research Mentor: Peter Miller Administrative Mentor: Chris Jones

Jayanta Pal (Ph.D., Statistics, University of Michigan) SAMSI Programs: High-Dimensional Inference and Random Matrices: - Multivariate Distributions - Geometric Methods Research Mentor: Don Richards Administrative Mentor: Chris Jones

Bala Rajaratnam (Ph.D., Statistics, Cornell University) SAMSI Programs: High-Dimensional Inference and Random Matrices: - Graphical Model/Bayesian Methods - Regularization and Covariance - Multivariate Distributions - Geometric Methods Research Mentor: Hélène Massam Administrative Mentor: Chris Jones

Elaine Spiller (Ph.D., Applied Mathematics, Northwestern University) SAMSI Programs: Development, Assessment and Utilization of Complex Computer Models: - Granular Materials - Methodology - Climate and Weather High-Dimensional Inference and Random Matrices: - Universality in Random Matrix Theory Research Mentor: Bruce Pitman Administrative Mentor: Chris Jones

Guillaume Vernieres (Ph.D., Oceanography w/ Minor in Mathematics, Oregon State University) SAMSI Programs: Development, Assessment and Utilization of Complex Computer Models: - Dynamics of Infectious Disease - Climate and Weather Research Mentor: Pierre Gremaud Administrative Mentor: Chris Jones

Gentry White (Ph.D., Statistics, University of Missouri, Columbia) SAMSI Programs: Development, Assessment and Utilization of Complex Computer Models: - Engineering Methodology Research Mentor: Tom Santner Administrative Mentor: Ralph Smith B.1. Overview of Postdoc Selection, Postdoc Activities and Mentoring Strategies

The SAMSI Postdoctoral Fellowship experience is designed to bring together Statisticians and Applied Mathematicians in formal integrated research settings (e.g., Working Groups), informal settings (e.g., Lunches, seminars and events for undergraduates), and in opportunities for collaborations with researchers in other scientific disciplines.

Focus on integrating statistical and applied mathematical aspects in SAMSI programs begins with the Postdoc selection process. Of course, during the 2006-7 grant year, candidates applied to participate in the 2007-8 SAMSI Programs (Random Media, Risk Analysis, Extreme events and Decision Theory, and/or Environmental Sensor Networks). The recruiting process involved not only SAMSI researchers and the SAMSI Directorate, and advertisement on the SAMSI web pages; but also the 2007-8 Program Leaders and Scientific Advisory Committee were invited to assist in bringing the SAMSI opportunities to the attention of promising doctoral candidates working in program- relevant areas of research. The success of the increased involvement of 2006-7 Program Leaders in recruiting postdocs for the Random Matrices Program led to expansion of Program Leaders’ roles in postdoc selection for 2007-8. For the Random Media and Risk programs, the Program Leaders were invited to review postdoc applications and to make recommendations of candidates for interview. Candidates for Sensor Networks with potential second year support from a Program Leader were also interviewed individually by the Program Leader. Inclusion of the Program Leaders in the active interview process has proved invaluable, especially when the Program Leader(s) brought the requisite expertise for evaluating a candidate’s dissertation research. Another change in the recruiting process for 2007-8 was the frequent (but not exclusive) use of webEx plus teleconference connection to permit interviews to be conducted remotely. Use of the remote connection also made it possible for interested Program Leaders to view the candidates’ interview talks and participate in the discussion of that work. This expanded role for the Program Leaders has been successful in its intent: to give special attention to assuring good matches between candidates selected and their possible mentors both during the initial year at SAMSI and the during the second year of fellowship. However, the final decisions about postdoctoral fellowships continue to rest with the SAMSI Directorate, although for 2007-8, the careful consideration by the Program Leaders has led to happy consensus decisions.

When Postdocs first arrive at SAMSI they become part of a Postdoc Community that in addition to SAMSI Postdocs includes NISS Postdocs and other young researchers in the NISS-SAMSI complex. This lively Community has monthly Postdocs Lunches with the Directorate where topics often include the practicalities of an academic or a research career (how to interview successfully for a position, how to plan and write a research proposal, how the publication process works in the mathematical sciences from journal selection through interpretation of written reviews to successful revision). A Biweekly “Postdoc Pizza and Presentation” seminar with “practice job interview” presentations of research results to the Postdoc Community (and interested graduate students) serves to refine presentation skills at the same time that it serves an interdisciplinary role to inform Postdocs coming from different disciplines and/or working on different SAMSI Programs. Other Postdoc responsibilities include assisting with the SAMSI Undergraduate workshops, where Postdocs continue to be the most effective presenters for students of this age.

Effective mentoring of Postdocs is an essential part of SAMSI’s mission; so each Postdoc acquires two mentors. The first is a Research Mentor, commonly the Working Group Leader of the Postdoc’s principal Working Group. The second is a member of the Directorate whose natural role is to be a second, non-technical, pair of ears and a second personality with knowledge of local issues and general SAMSI information. This second mentorship also connects Directorate in a personal, non-evaluative way to Postdoc Life at SAMSI. In their comments, SAMSI Postdocs have continued to report that they feel well-supported by this dual-mentor system and by both particular mentors in their personal evaluations.

B.2. 2006-7 SAMSI Postdoc Activities

In mid January SAMSI Postdocs summarize their experiences in Mid-Program Activity Reports. In April SAMSI Postdocs summarize their experiences in Activity Reports and they also provide evaluations of their postdoc experiences. In Section B.3.1 each Postdoc has identified their Working Groups (primary and secondary) and other particular activities. Then follow the synopses of their research work, their accomplishments to date and their longer-term research agenda deriving from this research for each primary Working Group. Finally a list of publications, works in preparation and presentations at conferences, etc., is given. The Research Mentor’s commentary on the Postdoc’s work follows this self-evaluation. Section B.3.2 contains each current Postdoc’s responses to the 10-question evaluation of the SAMSI Postdoc experience.

B.2.1 Postdoc Mid-Program Research Reports

Ariel Cintron-Arias My appoint in SAMSI started in September, 2006. I joined the year-emphasis program titled “Development, Assessment and Utilization of Complex Computer Models”. I am a member of two working groups: “Methodology” and “Models of Cerebral Blood Flow”. I am the webmaster of the cerebral blood flow working group. In addition, I audited a SAMSI course in “Flowing Granular Materials”. I also joined a seminar in generalized sensitivity, which was held in North Carolina State University.

James Crooks I am participating, at some level, in four working groups, one from last year's Astrostatistics program, and three from this year's Computational Modeling program. Astrostatistics -- Exoplanets Group I am developing a novel method of importance sampling for the situation where you have a set of points randomly sampled from the density proportional to the function you wish to integrate. These integrals are to be used to calculate the Bayes' factors needed to determine the relative odds of a target solar system containing n planets vs. m planets. However, the method can be used any time there is a random sample from a function to be integrated. Currently I am checking the method I developed against a variety of known test functions to evaluate its accuracy. I estimate it will be ready for write-up by the end of the semester. Computational Modeling Terrestrial Models Group/Methodology Group The Terrestrial Models group is using a large, expensive-to-run simulator of forest dynamics with an eye toward investigating the effects of future climate change. The idea is to run the model under current climate conditions until an equilibrium is reached, then branch the simulator into various greenhouse scenarios (as predicted by General Circulation Models). Due to the computational expense there is a need to find a quick way to emulate the model; however, the output is stochastic and there is no published method for interpolating between input parameter values (e.g., temperature) when the output is a sample from an unknown distribution. I am developing a method to do exactly this based on Gaussian processes. The method itself is currently a work in progress, and it is a very frequent subject of discussion in the Methodology Group (which has become more of a clearinghouse for ideas than a 'working' group). Infectious Disease Modeling Group I only recently joined this group, and since it meets monthly and has only just begun to meet under the SAMSI aegis I have yet to attend a meeting.

Cari Kaufman As part of the computer models program at SAMSI, I am participating in three working groups. Climate Working Group With Stephan Sain at the National Center for Atmospheric Research, I am modeling the temperature output of regional climate models run under varying boundary conditions and scenarios. We have carried out preliminary ANOVA modeling of average regional temperature; however, we plan to extend this to the spatial domain by using Gaussian process priors on the functional main effects and interactions. We have tested this functional ANOVA technique on simulated data and are exploring the connections with existing techniques using splines. Methodology Working Group With Derek Bingham at Simon Fraser University, I am developing approximations to the usual Gaussian process emulation techniques that rely on compactly supported covariance functions. We have tested several variants on simulated data. In April, I will visit Simon Fraser and we will begin applying these methods to computer code for an astrophysics model. We are also trying to extend previous theoretical results on covariance tapering to the case of product covariance functions.

Terrestrial Models Working Group With Jim Clark’s forest ecology group at Duke University, I am building a model to interpolate soil moisture over space and time, given topography, climate variables, and spatially and temporally irregular soil moisture measurements. Our activities to date have mainly included discussing the functional form of the model and gathering the disparate types of data from various sources.

As part of the random matrices program at SAMSI, I participated in two working groups. Climate and Weather Working Group This was a reading group that I led. The group focused primarily on principal component analysis and the ways it is used by climate scientists. In particular, we studied methods for detection and attribution of climate change. Regularization Working Group I participated in the weekly meetings of this group and gave a presentation on covariance tapering as a regularization technique. I am also pursuing two areas of personal research. Covariance Tapering I am finishing a paper with Doug Nychka and Mark Schervish based on my thesis research. Kriging with Estimated Parameters I am studying whether recent results on almost sure convergence of the MLE under the Matérn covariance model can be used to extend previous results on the consistency and asymptotic efficiency of the kriging predictor under a isspecified model to a particular case in which the covariance function is estimated. Papers in progress • Kaufman, C., Schervish, M., and Nychka, D. Covariance tapering for likelihood-based estimation in large spatial datasets. • Kaufman, C. and Sain, S. Functional ANOVA modeling of regional climate model experiments. • Kaufman, C. and Bingham, D. Efficient emulators of computer code using covariance tapering. Presentations • Covariance tapering for likelihood-based estimation in large spatial datasets. ƒ Statistics Department, Colorado State University, Fort Collins, CO, November 2006. ƒ Institute of Statistics and Decision Sciences, Duke University, Durham, NC, October 2006. ƒ Regularization Working Group, Statistics and Applied Mathematical Sciences Institute, Research Triangle Park, NC, October, 2006. ƒ Multivariate Methods in Environmetrics Conference, Chicago, IL, October 2006. (poster) • Efficient emulators of computer code using covariance tapering. ƒ Methodology Working Group, Statistical and Applied Mathematical Sciences Institute, Research Triangle Park, NC. ƒ Department of Statistics and Actuarial Science, Simon Fraser University, Vancouver, British Columbia, Canada, April, 2007. (scheduled) • Methods of covariance estimation in large climate datasets. Joint SAMSI/NCAR Workshop on Random Matrices, National Center for Atmospheric Research, Boulder, CO, May, 2007. (scheduled) • Functional ANOVA Modeling of Regional ClimateModel Experiments. Joint Statistical Meetings, Salt Lake City, UT, July 2007. (scheduled)

Jayanta Kumar Pal SAMSI workshops I am working in the High-dimensional data and Random Matrices Programme in SAMSI for year 2006-07. I helped organizing the opening workshop in that program and also the workshops concentrating on the Bayesian challenges and Large Graphical models. In addition to that, the following working groups meet every week and I am actively participating in research in each of them. • Multivariate distributions: We discuss many relevant research areas including Eigen-value distributions, zonal polynomials, wireless communications. I am actively collaborating with Professor Donald Richards on monotone incomplete data on Wishart matrices. Also, I meet some faculty members in Duke ECE department to discuss issues in Wireless communications. We hope to co-author a paper in the statistical challenges in that area. • Geometric methods: We concentrate on the geometry of the manifold of positive definite matrices and discuss probability distributions on that space using logarithmic distances. I am co-authoring a paper with Makram Talih in developing methods to estimate covariance matrices in that space. Presentations I talked on my research in the Joint Statistical Meeting in Seattle, August 2006. I also gave a talk on wireless communications and the use of representation theory in random matrices in SAMSI in November 2006. I have been invited to give a talk on my research in University of Georgia in March 2007. Also, I plan to visit University of Michigan during the next 6 months and the Joint Statistical Meeting 2007 in Salt lake City. There will be a transition workshop in AIM Palo Alto in April 2007 too. I plan to talk about my work in multivariate methods in these meetings. Individual work I continue my work on non-parametric inference from my dissertation with Michael Woodroofe. I submitted two papers on density estimation, one to the Annals of Statistics and one to the Statistics and Probability Letters in the last 3 months. Both of them are solo papers and the major work has been done in the first few months of my appointment. I am also co-authoring a paper with Mary Meyer of University of Georgia on Spline-based density estimation.

Bala Rajaratnam Fall 2006 Working Groups A. Graphical Model/Bayesian Methods: I have been regularly attending meetings for this group and took part in lively research discussions. I was the webmaster for this group and co-organized together with Helene Massam the Bayesian focus week meeting. I was the only post-doctoral fellow in the Random matrix program to co-organize a workshop during the first semester. I am collaborating with Helene Massam (who is also the group leader for the project) on a forthcoming paper. B. Regularization and Covariance: I am collaborating with Debashsis Paul and Helene Massam on regularization techniques for likelihood inference problems. C. Multivariate Distributions: I was invited to give the first presentation to the working group – with Webex connection to Stanford/Berkeley, MIT and other parts of the US. As with other working groups I have been involved in the lively discussions. D. Geometric methods: I have been regularly attending meetings for this group and took part in lively research discussions. Random Matrix Course In addition to involvement in the working groups, I was the sole post-doctoral fellow to take part in teaching the Random matrix course. I, together with guidance from Prof. Jack Silverstein, had to improvise at short notice to prepare to teach a part of the class. I was primarily involved in teaching some of Alan Edelman and Raj Rao’s recent work, including the Random matrix theory tool, to the graduate class. Presentations • Marginal likelihood for the eigenvalues of covariance matrices - Presentation to the Multivariate Distribution working group of the Random matrix theory program (Sep 2006). Publications/Work in progress • Rajaratnam, Massam and Carvalho (2007) Flexible Covariance Estimation –to be submitted in mid 2007. • Massam, Paul and Rajaratnam (2007) Regularization for likelihood inference – work in progress - to be submitted in mid 2007. • Chandan K. Reddy, Hsiao-Dong Chiang and Bala Rajaratnam (2006), “Stability Region based Expectation Maximization for Model-based Clustering”, In proceedings of the IEEE/ACM International Conference on Data Mining (ICDM), Hong Kong, December 2006. (Acceptance Rate is 9.4%).

Elaine Spiller Working groups and projects at SAMSI: 1) Granular Materials – Engineering (CM) Currently working on sampling methods for low probability events with the eventual goal of drawing hazard maps around volcanoes. Also working on fitting a GASP emulator to output from computer experiment runs. 2) Methodology (CM) Work related to project above. 3) Climate (CM) Currently working on a dada assimilation/particle filter project tracking a tracer moving in a flow where the dynamics are dictated by point vortices. Primarily this project is with Amarjit Budhiraja at UNC. Also plan to apply similar ideas to data assimilation/ state estimation project at NCAR. 4) Universality (RM) Some work on random matrices that arise when trying to characterize eigenvalues of solutions of the nonlinear Schroedinger equation. This working group is no longer active. Other research projects: Estimating rare events in long haul fiber communication systems and mode- locked lasers using perturbation theory and importance sampling. Talks related to this research: • “Phase noise and rare events in nonlinear lightwave systems” presented at SIAM conference on Nonlinear Waves and Coherent Structures, Seattle WA, September 2006. • “Rare events in dispersion-managed nonlinear lightwave systems” presented at Duke University’s applied math seminar, Durham NC, October 2006. • “Rare events in dispersion-managed nonlinear lightwave systems” to be presented at SIAM conference on Applications of Dynamical Systems, Snowbird UT, June 2006. Papers related to this research: • “Noise-induced perturbations of dispersion-managed solitons” (submitted to Phys Rev E, January 2007). • “Rare events in phase modulated nonlinear lightwave systems” (in preparation) Educational and outreach activities this year: 1) Spoke at and assisted with 2-day undergraduate seminar at SAMSI in November 2006. 2) To participate in a panel at Elon University’s “Professional discovery week” entitled “On becoming a mathematician” in February 2007

Guillaume Vernieres Working groups and projects at SAMSI: I am involved with 2 working groups in the Development, Assessment and Utilization of Complex Computer Models, I am also involved in a data assimilation project with Chris Jones and Kayo Ide. One group has begun working on a paper for publication. I am preparing presentations for conferences for the 2 other groups. Blood Flow Model Working Group: I am collaborating with Pierre Gremaud, Mette Olufsen, Kristen DeVault and Ariel Cintron-Arias on an inverse problem involving a model of the blood flow in the Circle of Willis and blood velocity measurement in the brain. We started writing a paper that will be submitted to the Journal of Physiology. Climate and Weather Working Group: I am the leader of the subgroup Planetary Boundary Layer, collaborating with Montse Fuentes and Josh Hacker. We are working on a parameter estimation problem involving a complex column model of the lower atmosphere (Weather Research Model from NCAR) and surface data. We will be presenting early results at NCAR in May. Lagrangian Data Assimilation in the Gulf of MexicoProject: This project is in collaboration with my mentor Chris Jones and Kayo Ide from UCLA. We will be presenting some early results at MSRI and the Stennis space center (NASA) in March followed by a poster presentation at Oregon State University in April.

B.3.1 Postdoc Research Reports with Mentors’ Comments

Ariel Cintron-Arias – Activity Summary Working Groups Dynamics of Infectious Diseases: I am a co-organizer of this SAMSI working group. I am the webmaster of the group's webpage [http://www4.ncsu.edu/~acintro/samsi_dyn_infc/dyninf_index.html]. In addition, my involvement in the group's monthly meetings has entailed leading discussions about basic deterministic epidemic modeling, exploration of public sources of epidemiological data, and estimation of influenza reproductive numbers from empirical data. Methodology: I attend the group’s meetings regularly and follow the suggested literature. Current Projects “Estimation of seasonal influenza reproductive numbers”: The effective reproductive number R_t is a fundamental parameter in the study of transmission dynamics of infectious diseases. We propose an analysis of seasonal influenza epidemics from the estimation of R_t while using longitudinal incidence data in the United States. “Analysis of oscillatory patterns in disease transmission”: The impact of heterogeneous contacts in oscillatory patterns of new cases of infection is assessed. Longitudinal measles case reports are employed in this study. Presentations • SAMSI postdoc seminar. “Modeling and Parameter Estimation of Contact Processes”, November 2006. • Invited speaker in the "SAMSI Two-Day Undergraduate Workshop" that was held during Mar 2-3, 07. • Poster presentation in a conference called “Opportunities in Mathematical Biology for Underrepresented Groups” held in the Mathematical Biosciences Institute of Ohio State University, during Mar 22-25, 07. • Invited speaker in the "Joint SAMSI/MUCM Mid-Program Workshop" during Apr 2-3, 07. • Invited speaker in the Comp-Mod Transition Workshop during May 14-16, 07. Other activities I am involved leading tutorials and serving as an academic mentor in the SAMSI/CRSC Undergraduate Workshop to be held in May 21-25, 2007.

Publications in preparation • “Estimation of seasonal influenza reproductive numbers”. In collaboration with H. T. Banks, A. Lloyd, C. Castillo-C.,and L. Bettencourt. The goal of this study is estimate epidemiological time-dependent parameters from influenza incidence data. • “Analysis of oscillatory patterns in disease transmission”. In collaboration with H.T. Banks, A. Lloyd, and P. Reichert. This study intends to explore the role of transmission uncertainty in the oscillatory patterns of new cases of infection. • “The role of nonlinear relapse on contagion amongst drinking communities”. In collaboration with C. Castillo-C., X. Wang, and F. Sanchez. The dynamics of drinking communities are modeled with both deterministic differential equations and stochastic network contagion models. The role of nonlinear relapse rates in the asymptotic dynamics is assessed.

Ariel Cintron-Arias – H.T. Banks’ and Peter Reichert’s Commentaries H.T. Banks’ Commentaries I served as one of Ariel’s scientific mentors during his year as a postdoc at SAMSI and have the following summary of his activities and growth during the year. He joined SAMSI in September, 2006. He has participated significantly in numerous activities during the year including: [1] Co-organizer of a SAMSI working group called "Dynamics of Infectious Diseases". He served as the webmaster of the group's webpage [http://www4.ncsu.edu/%7Eacintro/samsi_dyn_infc/dyninf_index.html]. Part of his involvement in the group's monthly meetings has entailed leading discussions about basic deterministic epidemic modeling, exploration of public sources of epidemiological data, and estimation of influenza reproductive numbers from empirical data. He has also participated very actively in a weekly working group meeting with the local subgroup of this working group. [2] He volunteered as a speaker in the "SAMSI Two-Day Undergraduate Workshop" that was held during Mar 2-3, 07. [3] He presented a poster in a conference [Opportunities in Mathematical Biology for Underrepresented Groups] held in the Mathematical Biosciences Institute during Mar 22-25, 07. [4] He was an invited speaker in the "Joint SAMSI/MUCM Mid-Program Workshop" during Apr 2-3, 07. [5] He is one of the mentors in the "2007 CRSC/SAMSI Undergraduate Workshop". His duties include lecturing about linear inverse problems, assisting computer labs, and giving academic advice concerning graduate school applications. [6] He will be an invited speaker in the CompMod Transition Workshop during May 14- 16, 07. In addition to these group related activities, his personal growth in research topics has been significant. He has worked on several research projects including efforts on statistical inverse algorithms with Banks and Lloyd in influenza models and with Peter Reichart, Lloyd and Banks on stochasticity in seasonal models of measles. Two research papers should be completed by Summer, 2007.

Peter Reichert’s Commentaries We started our collaboration some months ago. The goal of this project is to try to separate year to year variation of the transmission rate in a simple epidemiological model from seasonal variation. Technically, this is done by estimating the state of a time- dependent, stochastic parameter in the periodic (with seasonal variation) transmission rate of the model. Currently, two different epidemiological models were implemented and test evaluations have been done with influenza and measles incidence data. The project aims at evaluating and interpreting the results of long term observation data. Of particular interest is the comparison of the evaluations of a simple model without chaotic behavior and temporal varying transmission rate with a somewhat more complex model that explains year to year variation by chaotic behavior of the solutions of the differential equation without time-dependent parameters (with strictly periodic transmission rate).

James Crooks – Activity Summary Primary Working Groups Methodology: This group is a sort of umbrella group for the rest of the program; it is mostly a talking shop for problems of general interest that arise in the other groups. On about half a dozen occasions I have presented my work on a problem raised in the Terrestrial Models group regarding how to interpolate stochastic model output. I have also attended the two Workshops under the aegis of this group. Terrestrial Models: This group uses a forest stand simulator to model interspecies interactions and climatic effects. My project involves looking at the far from equilibrium dynamics of the forest under various climate change scenarios. I am also particularly interested in emulating this simulator. I am the webmaster for this group, and I have presented my work to it on four or five occasions. Furthermore, I recently gave a talk to the Mid-Program Workshop on the emulation issue. Secondary Working Group Infectious Disease Dynamics: This group is unlike the others in that it meets only monthly, but for an entire day, and is so large that it acts more like a regular workshop than a group meeting. I am a regular attendee at the meetings, and though I am not actively pursuing research in this area, I am regularly advising fellow postdoc Ariel Cintron-Arias on Markov chain Monte Carlo techniques and other facets of Bayesian methodology. Current Projects Stochastic Model Interpolation: This is my main project at the moment as it is relevent to both of my primary working groups. The forest stand simulator is slow to run for a reasonably sized forest, and the stochastic output does not follow a simple distribution. Thus, interpolation is both extremely important and extremely difficult to perform. I am attempting to combine ideas from Gaussian process regression, Dirichlet processes, and copulas into a probability model that allows sharing of information about model output CDF’s across the model input space. This is quite a difficult task, and though I intend to get a paper out of it when further developed. Related to this is my project using the forest simulator to perform prediction under various climate change scenarios. Which species can survive hot, dry summers and warm, wet winters? What degree of climatic variability can different species handle? Which species will outperform others? Are there major phase transitions in species diversity or distribution? Importance Sampling for Extrasolar Planet Detection: This is a project that stems from a former SAMSI working group under last year’s Astrostatistics program. In order to compare evidence for different numbers of planets around a star for which we have time course data we much calculate Bayes’ factors. However, the required numerical integrals are (2+5n) dimensional, where n is the number of planets. The idea I am exploring is to use Markov chain Monte Carlo output, a sample from a distribution proportional to the function I want to integrate, to guide the choice of kernel locations for a kernel-based importance sampling algorithm. Using simulated data I have tested my refined algorithm in up to 12 dimensions with results that are quite accurate. This project is quite close to the stage at which a paper can be written. Presentations Besides monthly or biweekly research updates to my two main working groups, I also gave a formal talk to the Terrestrial Models Mid-Program Workshop on April 4th entitled, “Progress in Interpolating Stochastic Model Output.” Activities • Summer School on Design and Analysis of Computer Experiments, August 11-16, Simon Fraser University, B.C., Canada. • Program on Development, Assessment and Utilization of Complex Computer Models Opening Workshop, September 10-14, 2006, SAMSI • Program on High Dimensional Inference and Random Matrices Opening Workshop and Tutorials, September 17-20, 2006, SAMSI • Program on Development, Assessment and Utilization of Complex Computer Models Joint Engineering and Methodology Subprograms Workshop, October 26-27, 2006, SAMSI • Program on Development, Assessment and Utilization of Complex Computer Models Biosystems Modeling Workshop, March 5-7, 2007, SAMSI • Program on Development, Assessment and Utilization of Complex Computer Models SAMSI/MUCM Mid-Program Workshop, April 2-3, 2007, SAMSI • Program on Development, Assessment and Utilization of Complex Computer Models Terrestrial Mid-Program Workshop, April 4, 2007, SAMSI

Jim Crooks -- Jim Berger’s Commentary As administrative mentor (and partly as scientific mentor), I meet with Jim at least bi-weekly. On the administrative side, everything has proceeded smoothly. Jim has been fully engaged in the Computer Modeling program, and seems to be having significant scientific mentoring from the leaders of the working groups in which he participates. I have also been acting as a secondary scientific mentor for two reasons. First, Jim began working at SAMSI last summer, during the final phases of the Astrostatistics program, and became engaged with research from that program involving methodology for model selection related to the search for Exoplanets. This was a research interest of mine, so I became his scientific mentor on this problem. He has been exploring methodologies based on a unique combination of Markov Chain Monte Carlo analysis and importance sampling, and has made excellent progress in assessing the potential of the approach. I have also been acting as secondary scientific mentor for his work in the Computer Models program, because Jim came from a background in astrophysics and so has gaps in his statistical knowledge. In our meetings we discuss his activities in the working groups, and I suggest relevant statistical background that he should study, as well as suggest ideas for solution of the research problems he is attacking. In particular, his primary focus has been on building emulators for output distributions of stochastic simulators arising in computer models of terrestrial processes, and he is well on the way to developing a successful emulator.

Cari Kaufman – Activity Summary Primary Working Groups Computer Models Methodology: With Derek Bingham at Simon Fraser University, I am developing approximations for Gaussian process emulation that rely on compactly supported covariance functions. Climate (Computer Models Program): With Stephan Sain at the National Center for Atmospheric Research, I am modeling the temperature output of regional climate models run under varying boundary conditions and scenarios. Terrestrial Models: With Jim Clark’s forest ecology group at Duke University and Jonty Rougier, visiting from Bristol University, I am developing a model to predict soil moisture under varying climate scenarios. Secondary Working Groups Climate and Weather: (Random Matrices Program) I led this reading group. The group focused primarily on principal component analysis and the ways it is used by climate scientists. In particular, we studied methods for detection and attribution of climate change.

Regularization: I participated in the weekly meetings of this group and gave a presentation on covariance tapering as a regularization technique. Current Projects Computer Models Methodology: With Derek Bingham at Simon Fraser University, I am developing approximations for Gaussian process emulation that rely on compactly supported covariance functions. We are extending the covariance tapering technique I studied in my thesis work to the case of nonisotropic covariance functions, which are commonly used in constructing statistical emulators of computer code. We have developed two approaches to this problem, one which adaptively tapers in each input dimension, and one which applies tapering after a transformation of the input space. We have carried out simulation studies which demonstrate the effectiveness of each of these approaches. We are also attempting to extend previous theoretical results on covariance tapering to the nonisotropic case. Climate (Computer Models Program): With Stephan Sain at the National Center for Atmospheric Research (NCAR), I am analyzing the output of a computer experiment involving crossing two regional climate models (RCMs) with boundary conditions from two global climate models (GCMs) and under two emissions scenario. This 2 RCM× 2 GCM×2 scenario experiment lends itself well to ANOVA modeling of the mean response, and it is an interesting statistical problem because the output of the computer model is functional. We have carried out preliminary ANOVA modeling of average regional temperature, and we are currently extending this to the spatial domain by using Gaussian process priors on the functional main effects and interactions. Terrestrial Models: With Jim Clark’s forest ecology group at Duke University and Jonty Rougier, visiting from Bristol University, I am developing a model to predict soil moisture under varying climate scenarios. The goal of this project is to provide input soil moisture values to the Clark group’s forest simulator, driven by temperature and precipitation data from a climate model. Our soil moisture model relies on a state-space representation, with a transition density for soil moisture which incorporates some basic physics. We have simulated soil moisture according to the model and plan to calibrate it using timedomain reflectometry data from the Coweeta Long Term Ecological Research Lab. For this purpose we will use particle filtering, a technique I used in my work on neuroprosthetic devices with Rob Kass and Val´erie Ventura while I was a graduate student at Carnegie Mellon University. Papers In Progress • Kaufman, C., Schervish, M., and Nychka, D. Covariance tapering for likelihood based estimation in large spatial datasets. • Kaufman, C. and Sain, S. Functional ANOVA modeling of regional climate model experiments. • Kaufman, C. and Bingham, D. Efficient emulators of computer code using covariance tapering. Presentations • Covariance tapering for likelihood-based estimation in large spatial datasets. o Statistics Department, Colorado State University, Fort Collins, CO, November 2006. o Institute of Statistics and Decision Sciences, Duke University, Durham, NC, October 2006. o Regularization Working Group, Statistics and Applied Mathematical Sciences Institute, Research Triangle Park, NC, October, 2006. o Multivariate Methods in Environmetrics Conference, Chicago, IL, October 2006. (poster) • Efficient emulators of computer code using covariance tapering. o Department of Statistics and Actuarial Science, Simon Fraser University, Vancouver, British Columbia, Canada, March, 2007. o Joint SAMSI/MUCM Mid-program Workshop, SAMSI, Durham, NC, April 2007. • Functional ANOVA Modeling of Regional Climate Model Experiments. o Joint SAMSI/NCARWorkshop on Computer Models, National Center for Atmospheric Research, Boulder, CO, May 2007. (scheduled) o Joint Statistical Meetings, Salt Lake City, UT, July 2007. (scheduled) • Methods of covariance estimation in large climate datasets. Joint SAMSI/NCAR Workshop on Random Matrices, National Center for Atmospheric Research, Boulder, CO, May, 2007. (scheduled) Other Activities I gave a tutorial on climate models at the SAMSI Two-Day Undergraduate Workshop in March and will present a similar talk at the SAMSI/CRSC Undergraduates Workshop in May. I also maintained the website for the climate and weather working group in the program on random matrices.

Cari Kaufman – Jonathan Rougier’s Commentary Since I arrived in February, Cari and I have worked together on the problem of modelling soil-moisture as part of the Terrestrial Working Group of the CompMod program. The objective is to provide a soil-moisture model for the SLIP forestry model. Over the past couple of months we have developed and refined a model for soil-moisture with multiple resolutions both spatially and temporally. We have devised a method for calibrating this model sequentially using observational data, and tested it out with synthetic observations. And we have collated and processed data for the Coweeta site: this is a time-consuming and painstaking job for which Cari deserves most of the credit. We have had weekly meetings with other members of the group, including the Biologists and Computer Scientists at Duke. We presented our model and its simulated response to Coweeta topography and forcing at the joint SAMSI/MUCM workshop, and the Terrestrial transition workshop. Following the latter we received valuable feedback regarding the response of the SLIP forestry model to soil-moisture, the role of our model in calibration and prediction, the nature of the observational data, and the layout of the observation sites. Subsequently we have started to explore modifications to our model, and also to do a more detailed analysis of the spatial-temporal structure of the observational data. This is where we are at the moment. Working on this project has allowed us to exercise traditional statistical skills, such a data processing, analysis, and visualization and animation (running to thousands of lines of R code), and modern statistical methods (graph-structured joint distributions, sequential importance sampling) with more applied maths skills, involving difference equations and ordinary differential equations, and simple physics. We have also discussed wider statistical issues concerning computer experiments, applied Bayesian inference, and foundations. Our collaboration on this project has been a joint effort throughout. Indeed, Cari has been the voice of sanity on several occasions. Her mathematical skills are excellent, although so far they have not really been taxed in this particular project. In addition, she is a very good programmer, both technically and in the important but often overlooked ‘administrative’ aspects: organizing and processing large collections of files, archiving data, maintaining concurrency. She has valuable experience in handling large spatial- temporal datasets. Her presentation to the SAMSI undergraduate workshop, for which she had clearly put in a great deal of effort, was exemplary. I’m looking forward to continuing our collaboration, particularly as we tackle the more technical aspects of model calibration using a particle filter, for which she has some experience, but I have none. Manjunath Krishnapur – Activity Summary I was a postdoctoral researcher at SAMSI during the fall of 2006, for the purpose of participating in the program on High Dimensional Inference and Random matrices. Primary Working Groups I was an active member of the working group on Universality in Random Matrix Theory. Universality, meaning independence of certain asymptotic results from the fine details of the model (eg., the distribution of entries of a random matrix), is undoubtedly the most important aspect of the theory. This working group was formed under the leadership of Prof. Peter Miller. The group made a list of problems to attack, and in successive weeks various participants gave presentations on the problems so that everyone could know what the issues were. At the same time, we thought about the problems and exchanged ideas at the meetings. I made the following presentations (details can be found on the website). Presentations • Basics of determinantal point processes(β =2 ensembles). • (A short presentation) on tridiagonal models for β-ensembles for a general potential. • An invariance principle with applications to random matrix theory and last passage percolation. • Non Hermitian random matrices and the Circular law. Research In addition to making presentations, I worked on several of the problems that were brought out in the working group. Apart from the working groups, I also had very useful discussions with Ofer Zeitouni, Jack Silverstein, James Mingoand, Debashis Paul, to name a few. Here are some of my projects/contributions: (Note these have not yet been published and some are being worked on right now. As always, there is a caveat that everything should be “assumed unproved until proved”!) • Asymptotic normality for determinantal point processes: In the opening workshop, Ofer Zeitouni posed the problem of showing asymptotic normality for (not too smooth) linear statistics of the Gaussian unitary ensemble. I had an idea that one could solve the problem under the larger banner of determinantal point processes by expressing the cumulants of such a linear statistic in a special form which makes it evident that they must converge to the cumulants of the normal distribution. From discussions with Zeitouni and Virag, it is now clear that this works, at least for eigenvalue in the complex plane, simplifying and generalizing some recent results of Brian Rider and Bag. A paper is a lint Vir under preparation. • Tridiagonal matrix models for β-Coulomb gases for a general potential: A leading open question in random matrix theory is that of edge-universality for Coulomb gases at any temperature. The case β =2, 1, 4 have been settled but those methods do not apply to other β. Motivated by a question of Mark Huber about simulation techniques for such ensembles, I noticed (Peter Miller also had the same idea) that one could make tridiagonal matrix models whose entries are not independent (as they are in a quadratic potential) but the dependence is Markovian. This makes it seem that simulations could be carried out more easily. • Edge-universality via tridiagonal models?: In a surprising twist, after I left SAMSI, in discussions with Bag, I understood Alint Virag that edge universality maybe approachable via tridiagonal models by exploiting the Markovian dependence (following Edelman and Sutton and Ramirez, Rider and Virag for quadratic potential). This is a very exciting prospect and we are working on it currently. • Circular law and its generalization: The problem of circular law was advertised by F.Gotzein the opening workshop and we discussed it briefly in the working group. I have been thinking about it since. A significant advance was made by Gotze and Tikhomirov recently. However based on a recent paper of Tao and Vu, we think that it is now possible to crack the problem fully. I have also obtained a considerably simplified proof of various parts of the somewhat painful proof of circular law. Further, I was able to make a link with my Ph.D. thesis where I studied random matrix-valued analytic functions. This leads to a considerable generalization of the circular law. I am currently discussing these matters with Van Vu, and a paper should appear shortly.

Manjunath Krishnapur – Peter Miller’s Commentary It was a great pleasure working with Manjunath Krishnapur (Manju) at SAMSI in the Fall semester of 2006. Here is a brief report on our interaction. Summary of interactions with the postdoc Manju and I worked together through the Universality working group. On a purely formal level, Manju served as the webmaster for this group, of which I was the leader. But our interactions went far beyond this level. We met regularly outside of the working group meetings to discuss mathematical matters arising in the working group as well as to plan joint presentations to the group. Manju also regularly attended the graduate course on Random Matrix Theory of which I was one of the instructors. His participation in this course was substantial, as he frequently contributed to the lectures by making useful comments and asking penetrating questions. I mentioned that Manju played an important role in the intellectual life of our working group on Universality. To be more specific, he led the discussion in three of our sessions (out of ten sessions for the whole semester), a frequency which was not exceeded by anybody else in the working group. His subjects were the following (more information is available from the Universality working group’s web page on the SAMSI website): 1. Circular ensembles (eigenvalues of random unitary matrices). 2. An invariance principle with applications to random matrix theory. 3. Non-Hermitian random matrices and the circular law. Mentor’s assessment of postdoc performance. Manju was an excellent postdoc, and it was a pleasure to work with him at SAMSI.

Postdoc experience. I think that Manju also gained something valuable through his experience at SAMSI in Fall 2006. For one thing, he became quite familiar with the techniques of Riemann-Hilbert analysis that were frequently discussed in the working group. Also, earlier in the semester, he began thinking about beta-ensembles and found some interesting relations between tridiagonal matrix elements in Householder reductions of Hermitian matrices from unitarily invariant but nongaussian ensembles, which seems exciting and may lead to some new results. This was, I believe, also a new direction for Manju and indicates the value that a semester in residence at SAMSI had for him.

Jayanta Pal – Activity Summary I have been involved in the SAMSI program on High Dimensional Inference and Random Matrices program in Fall 2006. The program started with an opening workshop at the Radisson Hotel in RTP in September. Leading experts working in this field gathered at SAMSI and focused on the varied challenges that stem from both theoretical aspects and applications of the field. Working Groups I participated in three working groups originating from the workshop, in various levels of participation. There are several problems proposed in the opening workshop that we started to look at in the working groups: • Multivariate distributions: Led by Professor Donald Richards and Professor Iain Johnstone, the working group on Multivariate Distributions mainly looked at theoretical problems related to the covariance matrices and eigenvalue distributions. Every week meetings were held at SAMSI with remote participants connecting through WEBEX. The focus was on developing mathematical results from representation theory to derive quantities related to the covariance matrices. Another approach was to use Jack, Schur and Zonal polynomials which helped to compute integrals related to the space of all positive definite matrices and simplify the expressions to make it tractable. Current projects I am collaborating with Professor Donald Richards from Pennsylvania State University in several projects. The work in progress includes developing inference theory in monotone incomplete data. Here we assume that we have observations missing in a specific pattern. We would like to develop the maximum likelihood techniques, derive small sample and large sample properties of the estimators of mean and covariance matrices and establish comparison techniques for two populations. The classical theory may not be applicable and we may have to develop techniques from Fourier analysis. Other work includes developing techniques in MIMO (Multiple Input Multiple Output) networks in Information technology. Our work is motivated by: L. Moustakas, S. H. Simon and L. Marinell (Bell Labs). We try to derive techniques from representation theory and compute capacity, information using moment generating functions. I have given a talk from our research in the working group meetings. • Geometric methods in random matrices: Our working group consists of Dr. Makram Talih from CUNY, Dr. Armin Schwartzman from Harvard, Bala Rajaratnam and Professor Helene Massam. We focused on the Geometry of the convex space of positive definite matrices. One of the problems that we worked on was to track the geometric mean of positive definite matrices on that space. Later, we developed techniques of comparing the different subspaces of the convex space and the unified likelihood based approach for hypothesis testing. For example, one may like to test whether the underlying covariance matrix in a statistical problem is of a specific shape (independence, for example). Our work in progress also includes defining measures of central tendency and dispersion on that Geometric space using differential manifolds. This has applications in Brain imaging, MRI data. We will test our techniques in real applications. • Environmental modeling: I attended weekly meetings with other participants and discussed techniques arising from Empirical Orthogonal functions (PCA analysis) to analyze data from climate modeling. Current project for individual research • I collaborate with Professor Mary Meyer, of the Statistics department at the University of Georgia. Our project is to estimate density functions under certain shape restrictions using cone programming. We use spline-based techniques where the shape constraints can be incorporated linearly, and so far have been able to derive maximum likelihood and least square based techniques. • Another of my joint projects involves estimation of unimodal densities, hazard functions. This is a collaboration with Dr. Moulinath Banerjee, at the University of Michigan. My collaborators also included Dr. Xiao Wang, of the University of Maryland, in Baltimore. We are working on a project to estimate the dark matter distributions in the nearby dwarf spheroidal galaxies. Matthew Walker, an Astronomy graduate student from University of Michigan is responsible for the data collection, and has been involved in the joint project.

Publications Submitted during my stay in SAMSI: • End-point estimation in decreasing densities: a penalized likelihood ratio based approach. Submitted to Scandinavian Journal of Statistics. • Spiking problem in monotone regression: penalized residual sum of squares. Submitted to Statistics and Probability Letters • Estimation of smooth link functions in monotone response models. Submitted to Journal of Statistical Planning and Inference For earlier publications, visit www.samsi.info/jpal/publication.html. Manuscript in Progress: • Model-independent Estimates of Dark Matter distributions. With Xiao Wang and Matthew Walker. • The least squares Regression Spline decreasing density estimator. With Mary Meyer. • Exact inference for monotone incomplete data. With Don Richards.

Presentations • I gave a talk in University of Michigan in December 2006, in a seminar held by the students. The talk was ’Estimation of eigenvalues for random matrices.’ • I gave a talk in University of Georgia in March 2007 on ’Estimation of decreasing densities near end-points’. • I plan to give a talk in the Transition workshop in Palo Alto, in AIM next April. This will be on my work with Don Richards. • I plan to give a talk in the Joint Statistical Meeting in the Salt lake City in August 2007. This will be on my work with Mary Meyer. • I plan to give a talk in Indian Statistical Institute, Calcutta in July 2007. This will be on my work with Don Richards. • I plan to give a talk in Indian Institute of Science, Bangalore in July 2007. This will be on my work with Don Richards. Activities • I helped organize the inaugural workshop for the HDRM program at the Radisson RTP, in September 2006. I also helped organize in the subsequent workshops in the same program, including the Bayesian Focus week, October 2006 and Graphical Models, November 2006. • I refereed articles in Statistics and Probability Letters and Journal of Nonparametric Statistics. • I will participate in the workshop on Geometry and Statistics of Shape Spaces, at the Radisson RTP in July 2007. • I will participate in organizing the Undergraduate Workshop on Inverse problems in July 2007.

Jayanta Pal – Don Richards’ Commentary Interactions with the Postdoctoral Fellow: My interactions with Jayanta went very well last semester. Jayanta was extremely helpful and diligent in maintaining the web page for, and assisting me with coordination of, the SAMSI Working Group on Multivariate Distributions, one of the working groups in the Program on High-Dimensional Inference and Random Matrices. Throughout the semester, I also advised Jayanta in the revision of some papers which he planned to submit for publication. I read his papers and made suggestions as to show he could revise the manuscripts so as to improve their chances for acceptance in major journals. Jayanta has informed that he has worked on the following papers: • Spiking problem in monotone regression : penalized residual sum of squares. • End-point estimation of decreasing densities -- asymptotic behavior of penalized likelihood ratio. • Estimation of smooth link functions in monotone response models. • The least-squares regression spline decreasing density estimator. • Model-independent estimates of dark matter distributions. • Distributions on the space of positive definite matrices. Articles 1-3 have been submitted for publication, articles 4-6 are in preparation, and the articles 1-2 are sole-authored by him. My assessment of Jayanta's performance is that he has made solid progress in his postdoctoral fellowship. He has had several personal setbacks during the period; while those have inhibited his performance somewhat, I also feel that he has made strong attempts to overcome those setbacks. At present, Jayanta and I are collaborating on imputation methods for monotone incomplete multivariate normal data. I expect that we will be able to produce a fairly comprehensive article on that subject.

Bala Rajaratnam – Activity Summary Primary working groups Graphical Model/Bayesian Methods: I have been regularly attending meetings for this group and took part in lively research discussions. I was the webmaster for this group and co-organized together with Helene Massam the Bayesian focus week meeting. I was the only post-doctoral fellow in the Random matrix program to co-organize a workshop during the first semester. I am collaborating with Helene Massam (who is also the group leader for the project) on a forthcoming paper. Regularization and Covariance: I have been regularly attending meetings for this group and took part in lively research discussions. I am collaborating with Debashsis Paul and Helene Massam on regularization techniques for likelihood inference problems. Secondary working groups Multivariate Distributions: I was invited to give the first presentation to the working group – with Webex connection to Stanford/Berkeley, MIT and other parts of the US. As with other working groups I have been involved in the lively discussions. Geometric methods: I have been regularly attending meetings for this group and took part in lively research discussions. In addition to involvement in the working groups, I was the sole post-doctoral fellow to take part in teaching the Random matrix course. I, together with guidance from Prof. Jack Silverstein, had to improvise at short notice to prepare to teach a part of the class. I was primarily involved in teaching some of Alan Edelman and Raj Rao’s recent work, including the Random matrix theory tool, to the graduate class. Presentations • Marginal likelihood for the eigenvalues of covariance matrices - Presentation to the Multivariate Distribution working group of the Random matrix theory program (Sep 2006). • Flexible covariance estimation in graphical models Presentation to the transition workshop at the American Institute of Mathematics in Palo Alto (Bala Rajaratnam, Helene Massam and Carlos Carvalho). • Objective Bayesian Inference and Covariance matrices Presentation at the Bayesian Focus week, SAMSI (with Tom Diciccio (explaining the risk calculation and fielding question).

Publications/Work in progress (for current academic year only) • Rajaratnam, Massam and Carvalho (2007) Flexible Covariance Estimation – about to be submitted. • Massam, Paul and Rajaratnam (2007) Regularization for likelihood inference – work in progress - to be submitted in approximately 2 months. • Chandan K. Reddy, Hsiao-Dong Chiang and Bala Rajaratnam (2006), “Stability Region based Expectation Maximization for Model-based Clustering”, In proceedings of the IEEE/ACM International Conference on Data Mining (ICDM), Hong Kong, December 2006. (Acceptance Rate is 9.4%) • Diciccio, T., Rajaratnam, B., and Wells, M.T. (2007). Marginal Likelihood Inference for eigenvalues – nearly complete for submission • Mason, S. J., J. S. Galpin, L. Goddard, N. E. Graham, and B. Rajaratnam (2007): Conditional exceedance probabilities. Monthly Weather Review • Reddy, Chiang and Rajaratnam,“TRUST-TECH based Expectation Maximization for Learning Finite Mixture Models”, (Under second revision with IEEE Transactions on Pattern Analysis and Machine Intelligence, October 2006). Activities / Awards • Invited presentation to the Joint statistical meeting (Salt lake City, July/Aug 2007) • Invited to apply for faculty positions without having sent applications. • Invited presentation to the transition workshop at the American Institute of Mathematics. Future Research I plan to continue to work on research in Random matrix theory. There are several open problems we have identified (together with my mentors) and will be pursuing them actively. I have not drawn up a definitive research agenda for next year as this has to be done in consultation with the Vice Dean of the School of Humanities and Sciences at Stanford University, Palo Alto, California.

Bala Rajaratnam – Hélène Massam’s Commentary I first met Bala Rajaratnam when he interviewed for a position in my department. It was clear that Bala was an exceptional candidate and I immediately advised him to apply for a postdoctoral position at SAMSI. I was asked to be his mentor and I was glad to accept. As soon as the SAMSI program started with the Opening Workshop in September, Bala and I started working together on several fronts. At the Opening Workshop, Bala was immediately identified as the postdoc we all relied on when we wanted to make sure that speakers had their slides ready on the common computer for their talk, that the microphone was on, etc. He was also of invaluable help in the organization as well as the running of the Bayesian Focus Week, a week-long workshop I organized in October, as part of the larger program, and also in the running of the workshop on Large Graphical Models and Random Matrices organized by Nanny Wermuth. He was ready to help with the scheduling of the various talks as well as with the more menial task of carrying overhead projectors. At the same time, I could see him actively interacting with various participants in the workshop, learning about the most recent results and making new contacts. This certainly showed that he is a young fellow keen to help and keen to learn. Most importantly, he is a young researcher, who is extremely bright, very knowledgeable, keen to work and work very hard. During this program, I introduced him to the research area of graphical models. Graphical models are multivariate statistical models where dependencies between the various variables can be represented by means of a graph. This allows for sparsity in the parameter space, a condition which is necessary to the identification of dependencies in data set with a large number of random variables and relatively few data points. It also allows for statistical inference in a recurrent way for small number of variables at a time. During the first term, under my guidance, Bala learned the fundamentals of graphical model theory. At the same time together with Carlos Carvalho, a research associate at Duke University, we started working on a new method for estimating large covariance matrices in a graphical Gaussian model. While I took care of most of the theoretical graphical model results in the first term, Bala was quick in learning all this material and was able to improve our work in the second term in at least three ways. First he added a decision theoretical component to the work already done, second, he prompted us to add the study of the eigenvalues of these large covariance matrices and finally he did a superb job of programming, allowing us to verify the properties of our estimators on several numerical examples. I should say here that his knowledge of Matlab was initially not sufficient to do the necessary programming. He spent long hours teaching himself the necessary skills. We presented this work at the Transition Workshop that took place in April 2007 at the American Institute of Mathematics. I believe the work was very well received. The work has been written up and will soon be submitted for publication to the Annals of Statistics as part of the special Large Random Matrices SAMSI Program issue. Bala is listed as first author on this paper. The AIM workshop has just finished. Bala was interacting with other participants, working on regularization, on the study of eigenvalues of large random matrices. He is clearly already considered as one of the main researchers in these areas and he is on his way to also be one of the main researchers in the area of graphical models. I have nothing but praise for this young scholar. It has been a pleasure working with him this year and I hope our cooperation will continue in the future.

Elaine Spiller – Activity Summary Primary Working Group Granular Materials –Engineering applications: Our primary objective is to draw hazard maps around volcanoes that tend to have pyroclastic (hot ash) flow events. Such flows are rapid events and hence there is no time for evacuations as pyroclastic flow events flow toward populated areas --- clearly such events are devastating and (hopefully) extremely rare. The latter raises many challenges. Our study begins with a simulation tool, TITAN2D, (pyroclastic flow PDE solver that incorporates digital elevation maps). A single flow simulation is expensive (~1 hour), thus Monte-Carlo simulations alone are not sufficient to capture (and map) the rare events of interest, i.e., those that affect populated areas. Currently I am working on sampling methods (a combination of importance sampling and emulator evaluations) in order to run simulations that will flow into populated areas (typical input values, inferred from data, will result in almost no inputs that lead to flows in populated areas).. Importance sampling allows use to correct for running the simulator at extreme inputs so we can calculate probabilities of such events. I recently presented our problem and current activities at the SAMSI/MUCM Mid-Program Workshop. We are currently discussing ideas for papers. Secondary Working Groups Climate and Weather: I am currently working on data assimilation/particle filtering problem (described below) on a point vortex model. My ultimate goal is to use the tools I’m developing in that problem and apply them to a climate modeling problem. This part of the project has not yet begun, but I plan to attend a workshop and summer school at NCAR in order to frame a problem. Methodology: I am not participating on a project directly in this group, but coming from an applied math background these group meetings have been invaluable. Specifically they have helped me both fill in my statistics background and learn how statisticians approach problems. Other Projects Point vortex data assimilation: I have developed a particle filter to assimilate tracer and votex dynamics in a point-vortice model of fluid flow. I have reproduced experiments done previous with an extended Kalman filter on this system. I am currently working on classifying common failure modes of the particle filter and methods to systematically improve the filters performance specific to each failure mode. I am particularly interested in using the dynamics of the physical system to improve the filter performance. Previous projects: I am continuing work on a problem that grew out of my thesis work. The main objective of the work is to predict failure rates in nonlinear lightwave systems, specifically systems with rapidly varying dispersion and noise forcing. The governing equation of these systems is nonlinear and nonlocal, but the equation possesses several symmetries (phase rotation, scale invariance, etc). These symmetries and the dynamics of the associated linear operator can be used to guide importance sampled Monte-Carlo simulations. Specifically one can use these symmetries to bias noise in a manner that makes errors occur more frequently than would naturally be the case. In particular, I use this model and method to study errors in optical communications systems and lasers used in optical atomic clocks. Presentations • SAMSI/MUCM Mid-program workshop “Predicting Geological Mass Flows from Field Data,” April 2007 • SAMSI Two-day undergraduate workshop “Rare events in nonlinear lightwave systems,” November 2006 • Duke University Applied Mathematics Seminar, invited talk “Rare events in dispersion-managed nonlinear lightwave systems,” October 2006 • Siam Conference on Nonlinear Waves and Coherent Structures, “Rare events in dispersion-managed nonlinear lightwave systems,” September 2006 Other Activities • Presented an undergraduate-level research talk and assisted with hands-on problem solving project at Two-day undergrad workshop • Reviewed articles for Journal of Lightwave Technology and Photonics Technology Letters • Organize and maintain Granular Materials – Engineering applications website Recent publications • J. Li, E.T. Spiller, G. Biondini, “Noise induced perturbations of dispersion- managed solitons,” accepted for publication (pending minor revisions) in Physical Review A. • J. Li, E.T. Spiller, G. Biondini, W.L. Kath, “Symmetries, conservation laws and linearized modes of the dispersion-managed nonlinear Schroedinger equation,” in progress. • E.T. Spiller, W.L. Kath, “Rare events in phase-modulated nonlinear lightwave systems,” in progress.

Elaine Spiller – Bruce Pitman’s Commentary My comments are limited to my interactions with Elaine regarding the granular materials sub-program at SAMSI this year. I know she has been actively involved in continuing work with colleagues, but I cannot address that effort. Elaine was a little slow to get involved in the granular materials/volcanology project. I think this hesitation was principally due to her not knowing how she might contribute and fit into the work. In conversations with her during the Fall, I offered that the rest of the working group didn’t quite know how the efforts of the working group might evolve, but there seemed to be interesting underlying questions to be addressed nonetheless. As things played out, Elaine did get more actively involved and contributed more actively to the group’s work. Her contributions have been positive. During our conversations, it became clear that Elaine was trying to make the most of her appointment at Duke, to meet the math faculty there and to learn about the research of the applied mathematicians in the Department. The feedback I have had from colleagues in the Department is that Elaine has made a positive impression on them. Elaine is aware of the need for published results to come out of her time at SAMSI and is eager to make that happen. Overall she seems quite aware of the expectations on her, and the things she needs to do to advance her career.

Guillaume Vernieres – Activity Summary Working group: Blood flow I have implemented an ensemble Kalman filter and smoother for a model representing the blood flow in the circle of Willis (back of the brain). The model was developed by Kristen DeVault, Pierre Gremaud and Mette Olufsen from NC State. The intent of the study is to estimate the first and second moment of a few parameters that define the open boundary of the problem. The assimilation method uses a strong constraint approach in which we have assumed the model to be perfect. The degrees of freedom come from the initial values of the state vector and 18 parameters. The blood velocity and pressure data were provided by a medical group from Washington DC and analyzed and checked for consistency by Kristen DeVault and Mette Olufsen. We are currently writing a paper (Validation of a cerebral blood flow model: the example of the Circle of Willis) that we are planning to submit to the Journal of Physiology this summer. This work will also be presented at 2 conferences on physiology. Working group: Climate, subgroup on weather forecast The WRF-1d (Weather Research Forecast) is a column model derived from the 3d WRF. One of its purpose is to estimate, at a much lower computational cost, a few parameters that can be used in the 3d WRF for the purpose of forecasting and nowcasting. The parameters to be estimated are known to have non Gaussian pdf's, which requires the use of non standard data assimilation method. We propose to estimate the posterior pdf's for the parameters, dynamical parameter equations and the dynamics of the WRF-1d model. We plan to implement an ensemble smoother based on Langevin sampling. This work is in collaboration with Montse Fuentes from NC state, Josh Hacker from NCAR and Amit Apte from MSRI. Preliminary results will be presented at NCAR in June 2007. Project: Lagrangian Data Assimilation This project is in collaboration with Chris K. R. T. Jones and Kayo Ide. I have implemented a realistic ocean model of the Gulf of Mexico to study the impact that Lagrangian information have in the data assimilation procedure. The data assimilation method is based on ensembles and loosely on the use of the representer theorem. The intent is a proof of concept in a realistic setting for an algorithm developed by Ide et al. in 2002. Using Lagrangian observation to estimate the state of a model has been shown to be very efficient. Through the study of the geometrical and temporal structure of the covariance matrix projected on the data set (the representers) we intend to shed light on the cause of the efficiency. Some preliminary results were presented at the Stennis Space Center (NASA) in October 2007 by Kayo Ide. A poster was presented at MSRI during a dynamical system work shop. Presentations • “Mathematical modeling of the ocean” presented at an undergraduate class at UNC. • “Lagrangian data assimilation in the Gulf of Mexico” poster presented at Oregon State and MSRI. • Talk presented by Kayo Ide at NRL (NASA). Manuscript in progress • Lagrangian data assimilation in the Gulf of Mexico: A proof of concept in a realistic setting. Guillaume Vernieres, Kayo Ide, Chris K. R. T. Jones. To be submitted to the Journal of Geophysical Research. • Validation of a cerebral blood flow model: the example of the Circle of Willis. Kristen DeVault, Pierre Gremaud, Mette Olufsen and Guillaume Vernieres • On the nonlinearity of the Kuroshio South of Japan. Guillaume Vernieres, Robert N. Miller, Laura Ehret. To be submitted to the Journal of Geophysical Research. • Assimilation of dynamic topography data in a 2 layer quasigeostrophic model of the Kuroshio south of Japan. Guillaume Vernieres, Robert N. Miller, Laura Ehret. To be submitted to the Journal of Ocean Modelling.

Guillaume Vernieres – Chris Jones’ Commentary Guillaume has been associated with the Computer Models program at SAMSI and has been involved in two modeling projects and has been very active in a working group on blood flow. In a joint project with me, he has developed a working model of the Gulf of Mexico. This is a layered model that is capable of assimilating data and he has started to obtain some novel results running this in a parallel-processed mode. He has also collaborated on a project with Montse Fuentes (Statistics, NCSU and a leader of the CompMod program) on atmospheric modeling. Vernieres has engaged very well in SAMSI activities and his lively presence has added considerably to the life of the institute through the working groups and other informal interactions.

Gentry White – Activity Summary Primary Working Group Engineering Methodology Current Work I have presented papers and work concerning the development of state-space based emulators for application to engineering problems. In this work I have collaborated with Peter Reichert, Susie Bayarri, Tom Santner and Bruce Pitman. We are currently working on our first paper to feature examples of this methodology applied to a ground water runoff hydrological model. I have also presented at the joint workshop held with the MUCM group, and will present oru paper at the Joint Statistical Meeting in August of this year. Presentations • "Bayesian Kalman Filter for Emulation of Complex Engineering Models", SAMSI/MUCM Mid-Program Workshop April 3, 2007 • "Spline Based Conditionally Autoregressive Model for Spatial Data", 32nd Annual Spring Lecture Series, University of Arkansas April 13, 2007 • "Bayesian Kalman Filter for Emulation of Complex Engineering Models", Joint Statistical Meeting, Salt Lake City, UT August 2, 2007 Other Activities • Maintained website for Engineering Methodology Working Group 2006-2007 • Participated in SAMSI Undergraduate workshop May 2007

Gentry White – Tom Santner’ Commentary Gentry White earned his PhD in Statistics from the Department of Statistics at University of Missouri-Columbia in August 2006. This is his first year as a SAMSI post-doc. Gentry’s PhD research was closely tied to his graduate research assistantship at the Missouri Cancer Registry; this research resulted in which concerned the development of three spatio-temporal models for cancer incidence and mortality rates. The first of these models was based on a CAR prior for the spatial effects and on simple linear trends for temporal effects. A second model used a joint prior based on an intrinsic autoregressive prior for temporal trends and a CAR model for spatial trends. A third model used a spatial prior based on the thin-plate spline solution for spatial trends and an IAR prior for temporal trends. The prior based on thin-plate splines suggested a method for dimensionality reduction that he also developed during his research. During his year at SAMSI, he has been working to develop a dynamic-model emulator, initially as a rapidly computable metamodel for a finite element code that describes the temporal performance of the components of an aircraft engine, and more recently for an application in hydrology. He has presented the development of this work during the Engineering Methodology Working Group, at the Joint SAMSI/MUCM Mid- Program Workshop, and will summarize the results of this work at the 2007 JSM Meetings. More generally, he will mentor undergraduates during the SAMSI undergraduate workshop in May 2007 and present two technical sections at this workshop.

B.3.2 Postdoc Experience Evaluations

One of the benefits of the Postdoc Lunches was the opportunity to involve the Postdocs themselves in construction of an evaluation instrument to assess Postdocs’ perceptions of the SAMSI experience. Together with the Directorate the list of ten questions was constructed that has been in use at SAMSI ever since. This Section contains the complete unedited responses from the 2006-7 SAMSI Postdocs.

1. Program Involvement: . Which SAMSI program(s) have you been involved with and at what level(s)? (e.g., “been doing active research on …” or “just went to a tutorial/workshop to learn about that area which is new to me”) 2. Interactions with Other Institutions: . Describe your interactions with other institutions while at SAMSI. (e.g., a Triangle university, NISS or CRSC) 3. High Points at SAMSI: . What have been the high points of your SAMSI experience? 4. Suggestions for Improvement: . How could your SAMSI experience have been improved? 5. Mentoring: . SAMSI aims to provide solid mentoring of Postdocs. How successful has this been in your case? 6. SAMSI Benefits for the Future: . How do you think your SAMSI experience compares to what you might have encountered in a typical university setting? 7. SAMSI in contrast with University Setting: . How do you think your SAMSI experience compares to what you might have encountered in a typical university setting? 8. SAMSI in comparison to Other Experiences: . If you have had experience in an industrial, national or other lab setting, how would you compare that experience with your SAMSI experience? 9. Other Research while at SAMSI: . If you have spent considerable time during your appointment with some other research group, please comment on the benefits and drawbacks of this experience. (e.g., if your appointment was extended via an appointment at NISS, CRSC or elsewhere) 10. Other Issues: . Are there other issues or concerns you would like to bring up?

Ariel Cintron-Arias 1. Program Involvement: I have been involved with the SAMSI program called “Development, Assessment and Utilization of Complex Computer Models”. I attend the following working groups under this program: [1] Dynamics of Infectious Diseases (co-organizer) [2] Methodology (attendee) [3] Calibration of Computational Models of Cerebral Blood Flow (webmaster) 2. Interactions with Other Institutions: I am appointed through the Center for Research in Scientific Computation of North Carolina State University (NCSU). Since I have weekly meetings with two postdoctoral mentors in NCSU as well as with an additional mentor in SAMSI, I spend three days per week in the latter and two days in the former 3. High Points at SAMSI: It has allowed me broadening my training into Bayesian statistics including, MCMC methods and Kalman filters. The combination of attending working groups, workshops, formal collaborations, as well as informal conversations with other postdoctoral fellows who are statisticians have enhanced my training in modeling and parameter estimation of transmission processes. 4. Suggestions for Improvement: Perhaps the working groups can hold their meetings bi-weekly as opposed to every week. 5. Mentoring: I have three postdoctoral mentors. Dr. H. T. Banks (Inverse problems and parameter estimation), Dr. Alun Lloyd (Modeling of infectious diseases), and Dr. Peter. Reichert (MCMC methods and environmental modeling). I feel their mentoring comes from three different perspectives that compliment each other serves me with a broader view of modeling and parameter estimation. 6. SAMSI Benefits for the Future: Indeed. In addition to the technical collaborations, another benefit is a professional network made accessible by SAMSI, which consists of contemporary and former postdoctoral fellows as well as members of the SAMSI directorate such as Ralph Smith (NCSU), Jim Berger (Duke), Cristopher Jones (UNC), Alan Karr (NISS) and Nell Sedransk (NISS). Their advice during the initial stage of an academic career is extremely helpful. 7. SAMSI in contrast with University Setting: One hand, its location makes it very good for secluding oneself and getting a lot of research accomplished. There is a substantial number of talks and seminars held in SAMSI. On the other hand, its location makes it at times a little bit distracting having to drive far way to reach a library or attend seminars in one of the partner institutions, namely, Duke, University of North Carolina-Chapel Hill, and North Carolina State University. Another downside, is the lack of remote access to the SAMSI computer servers. 8. SAMSI in comparison to Other Experiences: Yes. I spent 10 weeks as a summer intern in Sandia National Laboratories- California during my second year in graduate school. Later, during my fourth year in the Ph.D. program I visited in Los Alamos National Laboratory for one year. As a SAMSI postdoctoral fellow I am released from teaching as well as administrative duties. It is a big plus that I don’t have to deal any bureaucracy in SAMSI, which is in my opinion one of the big downsides of some national laboratories. 9. Other Research while at SAMSI: N/A. 10. Other Issues: Everything is fine.

James Crooks 1. Program Involvement: Computer Modeling; I am involved with much of the work done by the Methodology and Terrestrial Models working groups 2. Interactions with Other Institutions: I attend some of the weekly colloquia at Duke’s Institute for Statistics and Decision Sciences. I also continue previous graduate work with epidemiologists and statisticians at UNC-Chapel Hill. 3. High Points at SAMSI: The postdoc-centered format and the large degree of professional interaction 4. Suggestions for Improvement: . The postdoc lunches are too few and far between to be as useful as they could be, and I wish there was an official postdoc journal club or regular bull session. 5. Mentoring: Very successful regarding subject matter mentoring (“will this idea work?”), less so for professional development (“where should I go from here?”). 6. SAMSI Benefits for the Future: It is allowing me to instantiate a pretty big career move for which I am extremely appreciative. Beyond that, it is helping me by introducing me to a huge number of people who I will probably be interacting with for years to come. Perhaps of most long-term importance, though, the regular group meetings are teaching what are and will be the interesting and important research areas. 7. SAMSI in contrast with University Setting: There is probably vastly more unforced interaction, and interaction within larger groups. Also, because the postdocs are the workhorses I feel more camaraderie with my colleagues than I probably would. 8. SAMSI in comparison to Other Experiences: N/A 9. Other Research while at SAMSI: What do you mean by ‘other research group’? ‘Other’ compared to what? I will assume you mean ‘other than SAMSI’. I am still involved in an epidemiology project from my graduate school days, though only for a few hours a month. I think it is good for me, though, so be able to watch a long-term project unfold and to keep in touch with some very smart people. 10. Other Issues: SAMSI is not perfect, but it is the most engaging place I’ve ever worked. I wish I could stay for more than two years!?

Cari Kaufman 1. Program Involvement: My research activities have primarily been in the computer models program. I was also involved in the random matrices program, leading a working group on climate and weather and attending the meetings of the regularization group. 2. Interactions with Other Institutions: My position is a joint appointment with the Geophysical Statistics Project at the National Center for Atmospheric Research (NCAR), with this past year spent at SAMSI and next year spent at NCAR. Part of this arrangement involves reciprocal visits to the other institution each year, so I spent three weeks at NCAR in November and will make another three week visit this semester. I have a formal collaboration with Steve Sain at NCAR, which will continue next year. I have also been collaborating with Jim Clark at Duke University, as well as Derek Bingham at Simon Fraser University, which I visited in March. 3. High Points at SAMSI: The informal interactions with statisticians working in diverse areas have been invaluable. The mid-program workshops have been very helpful as well, for directing the course of ongoing research. 4. Suggestions for Improvement: I feel lucky to be working with Jim Clark’s forest ecology group at Duke, but I regret not becoming more involved in research with members of the Institute for Statistics and Decision Sciences at Duke during my time here in North Carolina. 5. Mentoring: I’ve received useful advice from Jim Berger and from the postdoc lunches held by the SAMSI directorate. 6. SAMSI Benefits for the Future: I anticipate that the new collaborations I’ve begun here at SAMSI will continue for at least several years. These have been invaluable in developing an independent research program. I think my interactions with visiting faculty from other institutions have also given me a clearer sense of the field as a whole, which will be helpful in directing my academic job search. 7. SAMSI in Contrast with a University Setting: I think the level of interaction and feedback in the working groups has spurred my research to progress much more quickly than it might have in a university department. I see this as one of the major benefits of SAMSI’s program focus, having a “critical mass” of researchers in a similar area who can identify research problems and make rapid initial progress on them. 8. SAMSI in Comparison to Other Experiences: I spent two summers at the National Center for Atmospheric Research while I was a graduate student. That experience allowed me to get involved directly with subject-area researchers. In contrast, SAMSI visitors are primarily mathematicians or statisticians. However, the wide range of research activities in the Triangle area means this is not a true limitation. 9. Other Research While at SAMSI: The time I’ve spent at NCAR has been very beneficial in making progress on the regional climate models project. The only drawback is that scheduling conflicts between SAMSI and NCAR during my upcoming visit mean that I will not be able to participate fully in the May undergraduate workshop at SAMSI and will also have to miss a day of the transition workshop for the climate group. 10. Other Issues: I think the occasional social activities organized by the staff (for example, the salad club) have been valuable, and I would suggest having more of them.

Manjunath Krishnapur 1. Program Involvement: In the program on High Dimensional Inference and Random Matrices, I was a postdoc at SAMSI and an active member of the Universality working group. I made a few presentations and worked on the problems discussed (see my activity report for details). On a more mundane level, I managed the web page and helped (a little) Peter Miller in organizing matters. 2. Interactions with Other Institutions: I was appointed through UNC, Chapel Hill. I had discussions with Amarjit Budhiraja on certain large deviation problems (unrelated to random matrices) but no concrete theorem has emerged so far. I had an excellent office at Chapel Hill where I sometimes worked, and I also made great use of the library. 3. High Points at SAMSI: Getting to meet and discuss with many of the leading figures in the field was a great experience for a beginner like me. Academically the working group meetings, the opening workshop, the weekly random matrix course were most helpful. In a different way, having an office in beautiful and lush green surroundings full of deer and turtles was a great stimulant for contemplating. 4. Suggestions for Improvement: If only there was a more efficient public transport to get to SAMSI from Chapel Hill (and back)! Also being unable to walk to any place for lunch was occasionally annoying. The fact that I could not access SAMSI computers from outside was a pain too. 5. Mentoring: My case was slightly exceptional in that I was a postdoc for six months only. Within that period, I had as good guidance as possible. I owe a lot to discussions with Ofer Zeitouni, Jack Silverstein, Peter Miller, and Debashis Paul. There were postdoc lunches that met once a month to discuss academic issues, but as I was there only for a few months, I cannot judge the effectiveness. 6. SAMSI Benefits for the Future: Most of the problems I am working on currently are shaped by what I learned at SAMSI (and of course by my Ph.D. at Berkeley before that). My knowledge of random matrix theory was hazy before the workshop. Also, I got to know many random matrix theorists, and that is of obvious use to me if I am going to work in this field. 7. SAMSI in Contrast with a University Setting: Not having to teach allowed me to learn much more in a shorter time than I could have in a university and left me free to do research. In principle, there are some negatives also (eg., in a university I would attend many lectures in other branches of mathematics, which I did not get to do at SAMSI, naturally). But the clever arrangement of associating each postdoc through a university took away most of the negatives such as the lack of a library. Thus effectively there are no disadvantages at all, as long as the workshops running at the time are of interest to oneself. Also the staff at SAMSI are immensely helpful and friendly. That helps! 8. SAMSI in Comparison to Other Experiences: No such experience. 9. Other Research While at SAMSI: No such experience again. 10. Other Issues: None that I can think of, except the grumblings in response to questions Jayanta Pal 1. Program Involvement My stay at SAMSI started in September 2006. I participated in the High Dimensional Inference and Random Matrices program in Fall 2006 and Spring 2007. I was involved in the Inaugural Workshop at the Radisson hotel, RTP, and thereafter joined three working groups. The main focus of the first working group, headed by Professor Donald Richards and Professor Iain Johnstone, was in Multivariate Distributions, Eigenvalue theory. The second working group concentrated on Geometric analysis of the space of positive-definite matrices and finding measures of concentration for such matrices. In both of these working groups, I was involved in a leading role, helping to set the theme of regular discussions, and set up collaborations to work on specific projects. I attended regular meetings in a third working group, focusing on the modeling of climate phenomena. 2. Interactions with Other Institutions: I have been interacting with Duke University, Institute of Statistics and Decision Sciences. I attended the weekly seminars, and looked at the course schedules. I also had meetings with faculty members in the ECE department of Duke, and the Maths department of North Carolina State University. I had a few meetings with faculty members in the Biostatistics department of UNC, Chapel Hill. 3. High Points at SAMSI: In the last 6-7 months, I met a number of leading experts in Statistics, and learned about modern techniques in High Dimensional Inference and Random Matrices. It will be immensely useful for me to focus on the research problems from the area of multivariate distributions, random matrices theory and dimension reduction techniques. 4. Suggestions for Improvement: I feel that some emphasis on regular postdoc and faculty meetings could have been helpful. In addition, the statistics department of the triangle universities can participate in regular postdoc graduate student interactions. 5. Mentoring: My SAMSI directorate member was Professor Chris Jones, who was very helpful for me in acclimatizing in the new work atmosphere. He took care of the opening workshop and explained how things work at SAMSI. My scientific mentor is Professor Donald Richards. I worked with him in one of the working groups, namely on multivariate distributions, and had many useful discussions that led to research and collateral thinking. We collaborated on a number of projects and hope to have some significant publications over the next year or so. 6. SAMSI Benefits for the Future: The SAMSI environment helped me to extend my breadth in research to new horizons. I feel that any researcher in Statistics should have a significant width of topics to work in. SAMSI has tremendous scope in inter-disciplinary research, and I was able to venture in areas outside my dissertation and pick up new research problems. I would like to venture in more and more such problems. 7. SAMSI in Contrast with University Setting: SAMSI is a research institute, and the focus is on application of statistics to more and more inter-disciplinary problems. Though the University provides the presence of astute researchers from other fields and their own scientific problems, SAMSI is unique in that sense. Moreover, the regular programs on current important themes bring on cutting-edge researchers with all their unique ideas, and it helps the post-doctorate fellows with useful experience. 8. SAMSI in Comparison to Other Experiences: I haven’t had any such experience before. 9. Other Research While at SAMSI: I had collaborations with individual faculty members from remote universities, continuing some research on my dissertation topics. However, that was done simultaneously with the SAMSI programs, and apart from an allocation of time, there was no immediate problem. 10. Other Issues: I would like to mention that currently SAMSI does not have any funding available to buy books etc, or any research funds to travel except for one or two specific conferences. Some encouragement is needed for participation in more and more expositions.

Bala Rajaratnam 1. Program Involvement: I have been involved with the Random matrix program and at the level of post-doctoral fellow. 2. Interactions with Other Institutions: I have had an opportunity to interact with visitors from the Random matrix program and was able to exchange research ideas – this process has benefited my research. I am unable to say too much about my interactions with other institutions thus far because I have saved the travel support given to postdocs for the JSM. 3. High Points at SAMSI: Working and interacting with Prof. Helene Massam and others like Debashis Paul and learning about random matrix theory. 4. Suggestions for Improvement: As I do not have another research institute to compare with I am not sure how my SAMSI experience could have been improved. 5. Mentoring: My mentor, Prof. Helene Massam, has been truly a great mentor. 6. SAMSI Benefits for the Future: It is difficult to assess this immediately – I hope to have a clearer view of the impact of my SAMSI experience sometime in the near future. 7. SAMSI in contrast with University Setting: Once again, I am unable to accurately compare SAMSI to a typical university setting – both environments have their advantages and disadvantages. 8. SAMSI in comparison to Other Experiences: I have not had experience in an industrial, national, or other lab setting. 9. Other Research while at SAMSI: I have spent time mainly with my pre-allocated research group. 10. Other Issues: Not particularly – I would like to thank some of the people from the academic side of the random matrix program and some of the administrative staff for their kind assistance.

Elaine Spiller 1. Program Involvement: I am deeply involved in the compmod granular materials working group. To a lesser degree, I am involved in the methodology and climatology working groups. I further discuss my involvement in these groups in my activity summary. 2. Interactions with Other Institutions: I regularly attend both the applied math seminar (I spoke at it last October) and the statistics seminar at Duke. Also, I am working on a data assimilation project with a statistics professor at UNC -- we meet weekly. Additionally, I am continuing a collaboration that began last year when I worked with a professor and graduate student at SUNY at Buffalo. We have one paper accepted for publication and are currently writing a second. 3. High Points at SAMSI: • Working on new research problems in a supportive environment • Discussing research with colleagues and mentors with different backgrounds and hence different perspectives 4. Suggestions for Improvement: ƒ Ability to access computers away from SAMSI. Not having access to the computers away from SAMSI makes them much less useful for me and seems like a waste of resources – if I could access them from home, I would run significantly more simulations on the SAMSI computers. ƒ More in the way of travel funding ƒ A more inviting common area for breaks/eating. I think this would promote more discussions between people who aren’t directly working together. Couches would be nice too. 5. Mentoring: Quite successful I think. 6. SAMSI Benefits for the Future: This will certainly be easier to answer in a few years, but I believe the access to collaborators and interesting problems is invaluable for both my scientific and professional future. 7. SAMSI in Contrast with a University Setting: Having spent a year as a postdoc at a university I can say that my time at SAMSI has been a very different experience. Specifically the variety of research problems/collaborations to choose from is far greater at SAMSI. 8. SAMSI in Comparison to Other Experiences: NA. 9. Other Research While at SAMSI: I think my work outside of SAMSI is quite valuable. My background and the “community I was raised in” are fairly different (both the people and the scientific/mathematical interests) than the community at SAMSI. I think it is important to stay active in both communities. I also think my activity in each helps me bring a broad perspective to the other. 10. Other Issues: N/A

Guillaume Vernieres 1. Program Involvement: I have been involved in the 2006-07 program on “Development, Assessment and Utilization of Complex Computer Models”. I'm a leader of one of the subgroup on climate modeling. I am also involved in the modeling of blood flow. 2. Interactions with Other Institutions: I mainly interact with Kayo Ide from UCLA and Chris Jones from UNC on a project on Lagrangian data assimilation. I also interact with Pierre Gremaud and Mette Olufsen from NC State for the modeling of blood flow. Lastly, I interact with Montse Fuentes from NC State, Josh Hacker from NCAR and Amit Apte from MSRI on the climate modeling project 3. High Points at SAMSI: Interacting with statistician is a valuable experience. 4. Suggestions for Improvement: Being able to access the computers from outside of SAMSI would have greatly improved the work efficiency. 5. Mentoring: I mostly interacted with Chris Jones, Montse Fuentes and Kayo Ide. Montse Fuentes helped me interact with oceanographers from NC State and help me define the climate project. Chris Jones and Kayo Ide are my mentors for the second year of my appointment. They are guiding me through the meanders of Lagrangian data Assimilation. 6. SAMSI Benefits for the Future: SAMSI opened my research interest to other field than the atmosphere and the ocean. It also made me think more about the statistical aspect of inverse method. 7. SAMSI in contrast with University Setting: NA 8. SAMSI in comparison to Other Experiences: NA 9. Other Research while at SAMSI: NA 10. Other Issues: NA

Gentry White 1. Program Involvement: I was involved in the Development and Assessment of Complex Computer Models Program, the Engineering Methodology Working Group I was the webmaster. 2. Interactions with Other Institutions: I have mainly been a VIGRE post-doc at NCSU 3. High Points at SAMSI: The workshops have been very good, mainly the ability to work with very talented people from all over the world that I ordinarily not be interacting with. 4. Suggestions for Improvement: Possibly a more formal mentoring program, though since I was just an associate, my experience might be different than other SAMSI post-docs. 5. Mentoring: See above. 6. SAMSI Benefits for the Future: I think that the networking and the people I have met will provide excellent future relationships to build on in my research. 7. SAMSI in Contrast with a University Setting: I think the SAMSI experience has focused much more on pure research than a university position would. 8. SAMSI in Comparison to Other Experiences: No other experience. 9. Other Research While at SAMSI: N/A 10. Other Issues: None

B.4 Tracking 2005-06 SAMSI Postdocs and Follow-up Evaluations

B.4.1 Tracking 2005-06 SAMSI Postdocs

SAMSI is following the careers of previous Postdocs by asking them to summarize the impact of their Postdoc fellowship at SAMSI once they have left to embark on their careers. A comprehensive survey was conducted in the Spring 2006 to inquire specifically about past Postdocs’ career trajectories, about continuing collaborative experiences and about research results, publications and new plans. Results from this survey were reported in the 2005-6 Annual Report. The next comprehensive survey is planned for Spring 2008.

Amit Apte, (Ph.D., Physics, University of Texas at Austin) SAMSI Programs: Data Assimilation in Geophysical Sciences: − Langevin stochastic differential equations with applications to data assimilation − Dynamics of inertial particles in fluid flows and Lagrangian data assimilation National Defense and Homeland Security Program: − Agricultural Systems working Group Current Position: Postdoctoral fellow at the Mathematical Sciences Research Institute (MSRI), Berkeley, CA.

I am currently a postdoctoral fellow at the Mathematical Sciences Research Institute (MSRI), Berkeley, CA. I have accepted the offer for faculty position at the Tata Institute of Fundamental Research Center, Bangalore India and will be joining the Institute in August 2007. My current research has three themes - 1. the study of various phase space phenomena in area-preserving maps; 2. renormalization group techniques in dynamical systems; 3. data assimilation. My work on data assimilation, joint with Chris Jones of University of North Carolina, Chapel Hill, and Andrew Stuart and Jochen Voss of the , Coventry, UK, is a direct outcome of the interactions I had with them during the data assimilation program at SAMSI in the Spring semester of 2005. We take a Bayesian approach to the data assimilation problem [1] and construct the posterior distribution for the state of the system given the observations over a certain period of time. We proposed and implemented [2] the Langevin and Metropolis-Hastings sampling methods for sampling this posterior distribution. We argue that this posterior distribution is the optimal solution to the data assimilation problem and other approximate methods such as ensemble Kalman filter (EnKF) or 4D-var (four dimensional variational data assimilation) should be compared to this posterior whenever this posterior is available. Currently I am working on understanding the issues of model error in this Bayesian framework. Specifically we are investigating how to extend the sampling methods to the case when the model is not perfect. I am also starting to work on a current SAMSI postdoctoral fellow, Guillaume Vernieres, on a project involving the application of these sampling techniques to a weather prediction model. Publications: • [1] Apte, M. Hairer, A. M. Stuart, and J. Voss, "Sampling the posterior: an approach to non-Gaussian data assimilation," Physica D (2006) (In press; Available online) • [2] A. Apte, C. K. R. T. Jones, and A. M. Stuart, "A Bayesian approach to Lagrangian data assimilation," in preparation

Sava Dediu, (Ph.D., Applied Mathematics, Rensselaer Polytechnic Institute) SAMSI Programs: National Defense and Homeland Security Program − Agricultural Systems and Social Networks working groups − Sensitivity of dynamical systems to convex metric space parameters Current Position: CRSC Postdoctoral Fellow, Center for Research in Scientific Computation (CRSC), Mathematics Department, North Carolina State University

My current research interests are: Inverse Scattering Problems, Numerical Solutions for Partial Differential Equations and Sensitivity Analysis for Dynamical Systems. My interest in Sensitivity Analysis for Dynamical Systems started while I was a postdoc at SAMSI and full member of the NDHS Program. It has been my major research field in the last two years. It resulted in three papers published in refereed journals (which reflect SAMSI experience and affiliation) and one paper in Conference Proceedings. Refereed Journal Articles which reflect my SAMSI affiliation: • “Recovering Inhomogeneities in a Waveguide using Eigensystem Decomposition” (with J. McLaughlin), Inverse Problems, Vol. 22, June, 2006, pp.1227-1246. • “Time Delay Systems with Distribution Dependent Dynamics” (with H.T Banks and H.K. Nguyen), Annual Reviews in Control, accepted, to appear. • “Sensitivity of Dynamical Systems to Parameters in a Convex Subset of a Topological Vector Space” (with H.T Banks and H.K. Nguyen), Mathematical Biosciences and Engineering, to appear. • “Stochastic and Deterministic Models for Agricultural Production Networks” (with P. Bai, H.T Banks et al), Mathematical Biosciences and Engineering, to appear Conference Proceedings • “Time Delay Systems with Distribution Dependent Dynamics” (with H.T Banks and H.K. Nguyen), Plenary Paper, 6th IFAC Workshop on Time Delay Systems, L'Aquila, Italy, July 10-13, 2006 Talks at Invited Conferences • Recovering inhomogeneities in a waveguide using eigensystem decomposition, IPAM, Reunion Conference for Inverse Problems: Computational Methods and Emerging Applications, UCLA Lake Arrowhead, CA June 11-16, 2006 • Generalized Sensitivity Functions and their uses in biomathematical models. Atlantic Coast Conference On Mathematics In The Life And Biological Sciences Virginia Tech Blacksburg, Virginia May 3 – 5, 2007

Lisa Denogean, (Ph.D., Statistics, Cornell University) SAMSI Programs: National Defense and Homeland Security Program − Data Confidentiality and Anomaly Detection working groups − Study of data swapping for categorical variables − New measures of data utility and risk for data swapping − Network modeling Current Position: Postdoc at North Carolina State University, Department of Statistics SAMSI Primary Working Group Data Confidentiality: I worked with Bahjat Qaqish and Alan Karr on a new approach to data swapping. I continued to work on work related to this group, and we wrote two papers that should be submitted soon for publication. Current Projects Testing for Patterns of Linkage Disequilibrium: This semester I am collaborating with Zhao-Bang Zeng from the Bioinformatics Research Center at NC State University and working on quantitative trait loci (QTL). I am considering modeling and testing for linkage disequilibrium in DNA sequences using Markov models. I am evaluating an appropriate hypothesis test for different levels of linkage disequilibrium and studying the distribution of the likelihood ratio test statistic. Currently I am trying to find conditions for the valid use of the asymptotic approximation and hope to accurately implement the hypothesis test under a variety of populations. Presentations • Joint Statistical Meetings Invited Talk, “New Measures of Data Utility and Risk for Data Swapping”, July 2006 Other Activities • I taught two courses at NC State, a graduate-level linear regression course, and an undergraduate-level introductory statistics course. • I am currently participating in Dr. Zeng’s weekly working group. Recent Publications • B. F. Qaqish, L. R. Denogean, A. F. Karr, “A Stochastic Process Approach to the Analysis of Swapping for Categorical Variables,” in progress. • L. R. Denogean, A. F. Karr, B. F. Qaqish, “ Model-Based Utility of Doubly Random Swapping,” in progress.

Moustapha Pemy, (Ph.D., Applied Mathematics, University of Georgia) SAMSI Programs: Financial Mathematics, Statistics, and Econometrics − Computational Issues, Portfolio Management, and Model Uncertainty working Groups − Stochastic functional equations − Theory of viscosity solution Current Position: Assistant Professor of Mathematics at Towson University

Personal Research: My postdoctoral experience at SAMSI was very fruitful, it had helped me broaden my research interests and obtain a tenure track position at Towson University. Most of what I am doing right now are just the continuation of the projects that started when I was at SAMSI. My research right now has three main directions. First, I have been working with thesis advisor Prof. Qing Zhang and Prof. George Yin on various stock liquidation problems. Our paper “liquidation of a large block of stock” has been accepted for publication on the Journal of Banking and Finance. Together, we have also submitted two other papers on the subject for publication. The second main axis of my research is on the subject of the optimal control of stochastic functional equations. Since last year, my postdoctoral supervisors Prof. Tao Pang, Prof. Mouhsiung Chang and I have submitted half a dozen papers on that topic. Finally, I am also working with Prof. Michel Lenczner who was also visiting NCSU last year on some control problems for Atomic Force Microscope and Cellular Neural Network. Papers • M. Pemy, Q. Zhang, Optimal stock liquidation in a regime switching model with finite time horizon, Journal of Mathematical Analysis and Application, 312, (2006) 537-552. • M. Pemy, G. Yin, Q. Zhang, Liquidation of a Large Block of Stock, to appear in the Journal of Banking and Finance, available online since December 2006 • M. Pemy, M. Chang, T. Pang, Optimal Stopping For Stochastic Functional Differential Equations, submitted. • M. Pemy, M. Chang, T. Pang, Optimal Control of Functional Stochastic Differential Equations with a Bounded Memory, submitted. • M. Pemy, G. Yin, Q. Zhang, Liquidation of a Large Block of Stock under Switching Regime, Submitted • M. Pemy, M. Chang, T. Pang, Numerical Methods for stochastic optimal stopping with delay, submitted. • M. Pemy, G. Yin, Q. Zhang, Selling a large position: a stochastic control approach with state constraints, submitted. • M. Pemy, M. Chang, T. Pang, Finite difference approximations for stochastic control system with delay, submitted. • M. Lenczner, M. Pemy, on Controlled Cellular Neural Network. Working paper Grants • Henry C. Welcome Fellowship Grant from The Maryland Higher Education Commission, Towson University (2006-2009)

Jesus Rodriguez, (Ph.D., Probability, Cornell University) SAMSI Programs: Financial Mathematics, Statistics, and Econometrics − Credit Risk and Portfolio Management working groups − Multiple default derivatives − Stochastic portfolio theory − Pricing issues in energy markets Current Position: Postdoctoral Scholar in Statistics and Applied Probability at the University of California at Santa Barbara (UCSB)

Currently, I am a Postdoctoral Scholar in Statistics and Applied Probability at the University of California at Santa Barbara (UCSB). Jean Pierre Fouque is at UCSB and director of Center for Research in Financial Mathematics and Statistics (CRFMS). We have continued the work we started together while at SAMSI. In August I will join the Math Department at as a visiting assistant professor.

Francisco Vera, (Ph.D., Statistics, University of South Carolina) SAMSI Programs: National Defense and Homeland Security Program − Data Confidentiality and Anomaly Detection working groups − Secure computations software − Software on Bayesian scan statistics − Micro aggregation Current Position: Assistant Professor, Department of Mathematical Sciences, Clemson University

B.4.2 Follow-up Evaluations

One critical impact of SAMSI programs is on individual careers in addition to the SAMSI impact on progress in statistics and applied mathematics in achieving successful collaborations that fulfill the SAMSI vision of forging "a new synthesis of the statistical sciences and the applied mathematical sciences with disciplinary science to confront the very hardest and most important data- and model-driven scientific challenges."

In spring 2006, an extensive survey was conducted to evaluate this impact on participants in SAMSI programs, whether as students, postdoctorals, junior and senior Visitors. For this survey the initial contact via email was followed by telephone interview. The survey was designed to capture not only tangible evidence of SAMSI impact in the forms of publications, research grants, awards and continuation of collaborations originating in the SAMSI experience, but also the less-tangible influence resulting in reorientation of research direction, course content and focus and expectations for mentored [graduate] student research.

This evaluation survey successfully reached all 2004-5 SAMSI Post-Docs, between one- third and one-half of previous Program Leaders and long-term visitors and a similar proportion of surveyed students. Results from this survey were analyzed and reported in the 2005-06 Annual Report. Another comprehensive survey is planned for spring 2008.

C. Graduate Student Participation

I. Development, Assessment and Utilization of Complex Computer Models

Engineering Methodology Working Group

Dianne Bautista (Statistics, The Ohio State University) Dianne visited SAMSI from September 1, 2006 until December 15, 2006 to participate in the Program on Development, Assessment and Utilization of Complex Computer Models. She participated in several working groups but worked primarily with the Engineering Methodology and Methodology working groups. During her visit, Dianne Bautista worked on two projects related to her thesis. The first focused on a non-parametric method for estimating (a valid) correlation function as part of the process of predicting the output of a computer code. The second project was to develop a method for sequentially designing a computer experiment with multivariate output to find the Pareto Frontier of the set of codes. Both projects are continuing upon her return to Ohio State.

Gang Han (Statistics, The Ohio State University) Gang visited SAMSI from September 1, 2006 untilDecember15, 2006 to participate in the Program on Development, Assessment and Utilization of Complex Computer Models. He participated in several working groups but worked primarily with the Engineering Methodology and Methodology working groups. Gang Han completed work on developing a methodology for computer experiment output that has inputs that are either, by their nature, nominal- valued or should be treated as such. These types of inputs occur in biomechanics applications where mesh density or the level of discretization of functional input occurs. He has developed software that allows the fitting of these mixed qualitative quantitative models. Gang also started work on producing a methodology (and MATLAB software) for simultaneously determining calibration and tuning-parameter inputs to computer codes. This problem occurs in settings which both computer-and physical experiments have been conducted and the computer code has some inputs which are calibration inputs whose values in the physical experiment are unknown to the experimenter and, additionally, there are(numerical) tuning-parameters, present only in the computer code, that can be used to force the computer code output nearer the physical experiment output.

Calibration of Computational Models of Cerebral Blood Flows Working Group

Kristen DeVault (Mathematics, NCSU) Kristen’s research focuses on models and numerical methods pertaining to blood flow in the brain and she was a key member of the Calibration of computational models of cerebral blood flows working group. She will visit Dr. Novak’s group at Harvard in the summer of 2007.

Fei Liu (Statistics, Duke). Fei Liu actively participates in a number of working groups: Air Quality, where she gave a presentation: “Calibration for spatial and spatio-temporal model outputs”; Engineering Methodology, where she gave two presentations: “Discussion of a Thermal Model”, and “Dynamic Linear Models as emulators”;

94 Terrestrial models, and also in the Methodology working group, where she is the web- master. She will be presenting at the Transition Workshop in May, and also at the JSM07 SAMSI Topic Contributed Sessions in July. Fei has also been working on a thesis in the computer modeling area as it applies to functional data. The approach developed for such data consists of representing the function in the wavelet domain, reducing the number of nonzero coefficients by thresholding, modeling the nonzero coefficients as functions of the associated inputs using nonparametric Bayesian methods, and reconstructing the functions (with confidence bands) in the original (time) domain. For computational reasons, an extension of this approach is considered to an eigenspace whose basis elements are linear combina- tions of the wavelet basis elements. The number of nonzero coefficients is greatly reduced in this eigenspace, as consequently is the computational expense for the statistical inverse problem. Finally, the thesis considers an approach to representing functions as multivariate Dynamic Linear Models. This approach is useful when the functions are highly variable and, as opposed to attempting to represent the functions exactly, one seeks primarily to capture relevant stochastic structure of the functions. The method has been tested with a simulated data set, and will be applied to validate the Community Multi-scale Air Quality model, considered in the Air Quality working group.

Simon Lunagomez (Statistics, Duke) Simon is working under the guidance of Robert Wolpert (also of Duke ISDS) to develop hierarchical Bayesian models for pyroclastic flows, intended to help predict the frequencies of large volcanic eruptions over decade-or century-long periods. The models have underlying Pareto components for large individual flow eruptions and α-stable components for the aggregation of many smaller flows, all tailored to the kinds of data emerging from the Montserrat Volcano Observatory (MVO), establishedin1995 by the British Geological Survey and the University of the West Indies’ Seismic Research Unit to study the ongoing volcanic activity of the Soufriére Hills volcano. SAMSI participant Bruce Pitman (Math, Univ. Buffalo) and his colleague Eliza Calder (Geology, Univ. Buffalo, and MVO alumna) have secured access to the MVO data for Lunagomez and have assisted in its interpretation. A range of modeling, numerical, and methodological issues have already appeared in this work, giving Lunagomez a remarkable educational opportunity. This work will be part of his Ph.D. Thesis. He is also in the early stages of building stochastic process-based models for Gamma Ray Bursts (GRBs), with the help of Cornell University astronomer and SAMSI participant Tom Loredo. Simon Lunagomez gave a presentation to the Methodology working group on March 12, 2007.

Justin Shows (Statistics, NCSU) Justin works under the supervision of Montse Fuentes. Current mesoscale numerical weather prediction (NWP) models use complex, multi- layer, soil and canopy models to specify time-dependent lower boundary conditions for atmospheric solutions. Parameters, with values typically constrained by empiricism or heuristic physical arguments, are ubiquitous in these models. The literature shows that atmospheric solutions can be sensitive to choices of a few of them.

95 In practice, many of these parameters can be viewed as tuning knobs, and subjective tuning is acknowledged practice in NWP implementations. Many parameters are not necessarily constant, and may very slowly in time. Complex soil models attempt to account for a wide range of physical processes. Because these models provide lower boundary conditions for atmospheric models, and the metrics for success and utility are usually based in the atmospheric component, an argument can be made that simpler models should be constructed and objectively tuned. Simpler models usually have fewer parameters that control the atmospheric response, and their functional relationship to the atmosphere is usually more accessible. The goal of Mr. Shows’s project in collaboration with NCAR and under the supervision of Dr. Fuentes (NCSU) is to design and explore optimal methodologies for finding distributions of parameters. Experience shows that ensemble data assimilation is a useful paradigm to approach this problem, where covariance between prior distributions in observation space, and a parameter distributions, are readily available. The 1D model described above also enables efficient research on this topic. Mr. Shows will be presenting his research at the May 21-24 SAMSI/NCAR workshop.

Richard Yamada (Applied Mathematics, Cornell) Richard’s research focuses on stochastic kinetic modeling of molecular motor dynamics. During an extended visit to SAMSI, he collaborated with Darren Wilkinson in the Systems Biology working group.

II. High Dimensional Inference and Random Matrices

Sergei Belov (Mathematics, Duke) carried out research on the asymptotic solution of Riemann-Hilbert problems that arise in random matrix theory. He numerically calculated the generation and evolution of adiabatic invariants and steepest descent contours of the Riemann-Hilbert problem using the modern technique of steepest descent. He was an active member of the Universality working group.

Hongyan Cao (UNC) participated in the Regularization and Covariance working group, and took the Random Matrices course. She worked on developing a forecasting technique for multivariate time series, which combines dimension reduction with univariate time series forecasting and regression techniques. She will give a presentation about her research to the graduate fellows. She also plans to attend the Undergraduate Outreach Workshop and acts as an undergraduate mentor.

Zhenglei Gao (Statistics, Duke) was in the Bayesian Methods/Graphical Models working group last year. She attended the weekly meetings, read the papers to be discussed and worked on a project on the Bayesian analysis of spike train data. She also took the course Random Matrices.

William Lefew (Mathematics, Duke) used the classical steepest descent approach to definitively resolve a disagreement in the physics community over the nature and classification of optical transients which arise in the propagation of electromagnetic waves into a dielectric medium near resonance. He also studied numerically statistical

96 properties of the localization of eigenvectors of a random Jacobi matrix. He was an active participant in the Universality working group.

Trevis Litherland (Mathematics, Georgia Institute of Technology) works at the interface of random matrices and longest increasing/common subsequence problems. His time at SAMSI was very valuable since it afforded him the opportunity to deepen his understanding of random matrices in a very favorable environment. During his time there he finished a first paper.

Jinchi Lv (Mathematics, Princeton) worked on the high-dimensional feature selection and statistical estimation during his SAMSI visit. A paper was written: Fan, J. and Lv, J. (2006). Sure independence screening for ultra-high dimensional feature space. He has accepted a tenure track offer in the business school at the University of Southern California. The SAMSI program played an important role.

Xingye Qiao (Statistics, UNC) has been working on a problem that deals with classification with unbalanced datasets. In many classification problems, it is common to have a dataset with very uneven classification proportions. If one uses the standard misclassification rate as the criterion for classification, those minor classes may be ignored. In this project, he considers the use of adaptive learning to increase the weights for minor classes so that those classes will be classified more accurately. The usage of L1 penalty for variable selection with high dimensional data is also considered. Take the Random Matrices Course and attended the Biosystems Modeling Workshop and discussions in several other working groups. He will help with organizing the Undergraduate Workshop in May and will give a tutorial presentation

Teresa Selee (Applied Mathematics, NCSU) has been working on the coefficient of ergodicity, which is most commonly thought of as a bound on the subdominant eigenvalues of a matrix. This term has its origin in stochastic processes, and especially in giving information on the asymptotic behavior of sequences of stochastic matrices. Specifically, the coefficient of ergodicity gives the rate at which a Markov chain converges to an ergodic state. She has almost completed a draft of a thorough literature review; and extended the definition of the coefficient for stochastic matrices; moreover, she has generalized it to non-stochastic matrices and has proved several bounds on the coefficient.

Dhruv Sharma (Statistics, NCSU) has worked on variable selection from a (Bayesian) decision theoretic perspective. He is due to present a seminar on this work soon and at the JSM in Salt Lake City this summer. He is involved in the SAMSI Undergrad Workshop this May and will be mentoring and presenting a lecture on Introductory Statistics and Probability.

Hua Xu (Mathematics, Georgia Institute of Technology) His research is devoted to proving concentration inequalities for random matrices with stable (heavy tailed) entries. The novelty of this work is twofold, no independence assumption is assumed and tight

97 lower as well as upper bounds are obtained. A preprint on this work which will be posted on arXiv is almost ready.

Yingying Fan (Operations Research and Financial Engineering, Princeton) attended the opening workshop and visited for a short period afterwards.

Yufan Zhao (Biostatistics, UNC) was active in the regularization working group.

III. National Defense and Homeland Security

Stephen Zhou (Mathematics, NCSU) Stephen participated in the activities of the credit risk workgroup. During the semester, he continued his research under J.P. Fouque and defended his dissertation in February 2006. He has greatly benefited from the workgroup and the workshops at SAMSI. His research focuses on modeling the correlation of defaults which is one of the main issues in credit markets.

Agricultural Systems Working Group

Ping Bai (Statistics, UNC) Ping’s work focused on the stochastic modeling of the agricultural networks. In our simplified model we consider the stochastic model of a pork food chain with a set of aggregated nodes interacting with each other according to certain realistic assumptions. We model the evolution of the food production network, which currently contains 4 nodes, as a continuous time Markov Chain with discrete state space embedded in R4 non-negative integer lattice. The standard stochastic simulation algorithm(Gillespie) is used to obtain the long-term behavior of the system. On the other hand, the approximate long term behavior of the appropriately scaled system is also analyzed via the usual macroscopic deterministic rate equations. This deterministic approach makes it possible to carry out the sensitivity analysis of the parameters involved in the model.

Angela Govan (Mathematics, NCSU) During my fellowship, I have been partaking in the Agricultural group which is working on the issue of the agricultural terrorism. Having knowledge of the mathematical modeling of contagious diseases I am responsible for incorporating a model of Foot and Mouth disease into the Agricultural Network model developed by the group. Currently I am modifying an SIR (Susceptible-Infectious-Recovered) disease model to study the Foot and Mouth dynamics in the network.

Anomaly Detection Working Group

Shenek Hayward (Mathematics, NCSUI) Shenek was a graduate student working in the anomaly detection working group. She also participated in the seminar course associated with the program. The anomaly detection group focused on the “scan statistic” and Shenek gave a presentation on one of the fundamental papers in the area. As a result of her interest in the topic, she and her SAMSI faculty advisor (D. Dickey) devised a

98 reading course at NCSU in which they are studying a variation called the “double scan statistic”.

Data Confidentiality Working Group

Joyee Ghosh (Statistics, Duke) Joyee developed methods for performing nonparametric regression for horizontally partitioned data, i.e. data where three or more agencies possess the same attributes on different records. Her techniques utilize the secure summation protocol, which was demonstrated for linear regression by Karretal (2005). Joyee’s techniques add flexibility in modeling to the secure computation toolkit, which helps address the difficulty of agencies having to specify models without seeing others’ data. Joyee is preparing a manuscript for submission at the end of the semester. She presented the initial stages of her work to the NDHS confidentiality working group. This topic is not likely to lead to a dissertation theses, although it has been invaluable for Joyee to learn areas of statistics–data confidentiality and nonparametric regression–that she would not normally have studied.

Robin Mitra (Statistics, Duke) Robin developed methods for modifying survey weights when data are altered to protect confidentiality. This is neglected in the literature on data confidentiality, but it clearly is important: unaltered survey weights could increase the risks of disclosures and could decrease data quality, since the weights are no longer tied to the values of the data. This work will be lead to Robin’s preliminary examination as part of his progress toward a PhD at ISDS. He plans to produce a manuscript for submission by the beginning of the next semester. Robin has become very interested in data confidentiality, and he would like to pursue a dissertation in this area.

Saki Kinney (Statistics, Duke) Saki participated in the working group discussions, which have helped her get a big picture of the issues in data confidentiality. Saki is working on methodology for generating multiply-imputed, synthetic; i.e., simulated, data sets for public release. For her dissertation, she is working on theory of performing large-sample significance tests when multiple imputation is used simultaneously to handle missing data and to replace confidential data.

Social Networks Working Group

Jen-hwa Chu (Statistics, Duke) Jen-hwa was involved in building agent-based models of social network dynamics. These models incorporate the latent-variable space approach described by Hoff, Raftery and Handcock (2002), as well as covariate information such as gender and memory of past relationships. The intent is to build rule-sets such that the dynamics of agent behavior mirror the dynamical models being from two other perspectives by other teams in the working group. The comparison of the models is based upon summary statistics from repeated runs, such as the mean and standard deviation of the number of persistent cliques, the first three moments of the in-degrees, the mean and standard deviation of the number of triad completions, and so forth.

99 John Samuels (Mathematics, NCSU) John’s research with Alan Karr, Hoan Nguyen (a SAMSI postdoctoral Fellow) and H.T. Banks. During the year they have developed a model for social dynamics based on dynamic agent-specific characteristics and pair- specific attractions. Coupled stochastic differential equations define the evolution of the system. The SDEs are solved by a classical fourth-order Runge-Kutta discretization procedure. The principal focus of the research is on the “richness” of the models as a function of the variances of the stochastic disturbances. Samuels has contributed substantially to the research and methodology and is a co-author on the paper “Sensitivity to noise variance in a social network dynamics model.” He also played a major leadership role in the SAMSI Undergraduate Workshop held on May 30-June 3.

Eric Vance (Statistics, Duke) Eric studied social network behavior in elephant herds using data collected by ethologists. He fit the Hoff, Raftery, and Handcock (2002) latent variable version of the dyadic p* model and found that group dynamics change between the wet and dry season, and that genetic relatedness and the social hierarchy play a large role in elephant networks. The inferences are Bayesian, and use Markov chain Monte Carlo to find the posterior distributions of each of these effects. He has written a paper on the social network methodology for and it is submitted to the Journal of Organizational Computation.

Chien-Chung Wong (Mathematics, NCSU) Chien-Chung, working with Medhin, developed a model where each actor is endowed with a set of dynamic personal attributes, values, and preferences, and a set of statistical information on each of the other actors in the social group. If the social network consists of N actors we construct an NxN matrix of zeros and ones, called sociomatrix. If actori is friendly toward actorj, then the ij-th entry of the matrix will be one, otherwise zero. The diagonal entries of the matrix are set to zero. If the ij-th entry of the matrix is 1, then we say there is a link from I to j. In the model the ij-th link depends on the maximum of a payoff of an appropriately constructed nonlinear programming problem involving the attributes and values of the actors in the social group. The model can be modified to handle social status as well as general network dependence structure. For example, the link from i to j may not be completely independent of the link from I to k, and/or from j to k. The model also incorporates migration and preferred attributes. The approach developed captures the ideas of the well established P1 model introduced by Holland and Leinhart, and the more recent extension, the P2 model, due to van Duijn, Snijders, and Zijlstra. In particular the model developed reflects reciprocity, and attributes and values of actors i and j play a role in determining whether or not there is linkage between these actors.

IV. Financial Mathematics, Statistics and Econometrics

John Hyde (Mathematics, Duke) John was a second year graduate student. In collaboration with J.P.Fouque and Jonathan Mattingly, he explored models of company default. This became one of the major topic in Hyde’s preliminary exam to move in to PhD candidacy. John make progress in his understanding of credit risk modeling. He presented and critiqued a number of models to Mattingly. He also explored the model

100 with simulations. We was trying to develop new models of cooperate default which would give insight into how correlations in default time arise. John passed his Exams at the end of the fall. Unfortunately he chose to stop his pursuit of a PhD after passing his exams. Though he has left academic research, he is now perusing employment in the financial industry. His departure has terminated this particular research project, though he hopes to apply what he learned at SAMSI to others in the near future.

Arthur Sinko (Economics, UNC) Eric participated in the activities of the model uncertainty and Lévy processes workgroups. During the semester he continued his research under Eric Ghysels. He started to work on three papers and has greatly benefited from the workgroup and the workshops at SAMSI. His research is about MIDAS regressions, and how it relates to volatility modeling. This topic touches on both of the working groups he attended.

Jennifer Sloan (Statistics, North Carolina State University) Jennifer was mentored by Peter Bloomfield, who led the discussion in a Working Group session on “Credit Ratings.” Sloan jointly led the discussion at another session of the Credit Risk Working Group and participated in the “Special Topics in Financial Math” Course. She actively researched Credit Rating Transition and Credit Risk problems, and helped SAMSI in educational outreach activities.

Chong Tu (Statistics, Duke) I helped and took part in the Opening Workshop for Financial Mathematics, Statistics and Econometrics from September 18-21, 2005. I also took part in the transition workshop from February 27-28, 2006 and the Model Uncertainty Workshop. I attended the program courses “Advanced Topics in Financial Econometrics” and “Special Topics in Financial Mathematics.” I also joined the Model Uncertainty working group which had weekly meetings to discuss frontier papers.

Doug Vestal (Mathematics, North Carolina State University) Doug participated in a wide range of activities. In the Fall 2005, he took a special topics course in Financial Mathematics that emphasized the growing field of stochastic volatility models, credit risk, and the evaluation of real options. The course placed special emphasis on the current state of research in these fields. In this course, he was also exposed to the ideas that motivated the need for current research in these areas and some of the problems that remained to be solved. He was also a member of the SAMSI Credit Risk working group. In this group, various members presented current research trends in credit risk. He presented a model of recovery rates in a reduced form model along with some of my SAMSI colleagues. Other members gave presentations on the top down approach to credit risk, stochastic volatility and default correlation, credit ratings models, and a model for the unified valuation of equity and credit derivatives. As Doug will be doing his dissertation in credit risk, this working group was very helpful. In addition to a thorough review of the literature, he was able to see the current gaps in research to help develop ideas for his own research. In fact, because of the credit risk workshop, he is working on research to develop a new model of recovery rates. This is in addition to the research he started working on (along with Dr. Fouque and Dr. Carmona, two SAMSI participants) to develop an algorithm that enables the computation

101 of extremely rare events with applications towards intensity based models in credit risk. For the two-day Undergraduate Workshop in Financial Mathematics at SAMSI in October 2005, Doug wrote a document explaining various types of derivative contracts for the undergraduates. In addition, he explained how to implement, in Matlab, the first passage approach to the pricing of risky bonds. Among other things, this entailed teaching the students about yield curves and how to generate random variables. Dr. Fouque presented the binomial tree method to option evaluation and he showed the undergraduates how to implement it in Matlab to value an Asian option. At the SAMSI Undergraduate Workshop coming up in May 2006, he will be helping to teach, organize and execute a five-day workshop on inverse problems along with the other SAMSI Graduate Fellows and Postdocs. This workshop will place particular emphasis on developing the intuition behind modeling physical processes mathematically, the process of data collection, and the statistical analysis of the data collected for parameter estimation.

Yichao Wu (Statistics and Operations Research, UNC) From August 2005 to Dec 2005, I was a graduate student fellow associated with the SAMSI program in Financial Mathematics, Statistics, and Econometrics (FMSE). Throughout the program, I attended the opening workshop and the transition workshop of FMSE in addition to the opening workshop in the program of National Defense and Homeland Security and one tutorial in the Astro Statistics program. These workshops brought me to the frontiers in the corresponding assorted areas and brought me to a lot of interesting problems in interdisciplinary study. Additionally, I joined the working group on Levy Process led by Prof. George Tauchen. He invited experts on this area to present their recent work and lead discussion. In particular, he invited Prof. Torben Andersen to present “Jump detection in Finance”. I found this is really interesting and try to work on some related problems. Also Prof. Enrique Figueroa led a discussion on “An Overview of a Nonparametric Estimation Method for Levy Processes”. All of these activities improved my understanding how statistics can be applied to other areas and help them, which I think is very important for a graduate student majoring in statistics.

V. Astrostatistics

Floyd Bullard (Statistics, Duke) Floyd was an active member of the group, attending each meeting, maintaining the group’s webpage, and coding MCMC, importance sampling and other algorithms for model fitting and model selection. His graduate work is focused on activities of the working group. He has given a presentation in a student seminar series at Duke on the search for exoplanets, and at SAMSI as part of the graduate student and post-doc seminar series. He was one of several graduate students involved in the SAMSI Astrostatistics Program in the spring of 2006. He maintained the web page for the Exoplanets Working Group (http://www.samsi.info/200506/astro/workinggroup/exo/) and kept minutes of the weekly meetings. At two or three working group meetings he gave brief presentations of the results of some of his work such as trying to solve a model selection problem using a new

102 technique (integrating over a parameter space using nested sampling). Following up on the SAMSI workshop, he was a research assistant for Merlise Clyde (ISDS, Duke University) during the Fall of 2006, during which time they explored the problem of integrating over a highly multimodal space using nested sampling. He has now begun working on his Ph.D. thesis, that grew out of his participation in the SAMSI program. His thesis topic is “Improving the Efficiency of Scheduling Radial Velocity Measurements for Exoplanet Detection Using Bayes and a Fast Integral Estimator”.

Matthew Fleenor (Physic, UNC) Matthew’s thesis research (under Prof. James Rose, an astronomer) concerned studying dynamical and kinematical properties of galaxy clusters via spectroscopic observations of the constituent galaxies. Matt frequently attended SPS working group meetings to learn about open issues and current research on survey analysis methods. He made a special effort to visit SAMSI during the SPS intensive session, e.g., consulting with Martin Hendry. His thesis work was largely completed by the time of the SAMSI program, so the program did not directly impact his thesis work. Matt is now on the faculty in the Physics Department at Roanoke College in Virginia.

Pablo de la Cruz (Statistics, University of Valencia) Pablo is working on his Ph.D. thesis under the joint supervision of Vicent Martinez (Astronomy) and Jose Miguel Bernardo (Statistics) at the University of Valencia. Pablo resided at SAMSI throughout most of the astrostatistics program, participating predominantly in the Surveys and Population Studies (SPS) group, but also in the Exoplanets group. Pablo was the youngest student participating in these groups; he was a second- year student at the time. He participated in nearly every Exoplanets and SPS working group meeting. He also interacted extensively with researchers when they visited SAMSI, often scheduling one-on-one meetings to learn about their work and methods. He prepared an extensive presentation on “Statistics for the Large Scale Structure”, providing a survey of work on quantifying 2D and 3D structure in the galaxy distribution, and reporting on work in progress with Martinez. Pablo cites his extensive personal interaction with researchers as the most important and rewarding aspect of his SAMSI participation. His peer students in statistics at Valencia for the most part get assigned research problems by their advisors after their second year. Pablo instead is exploring several possibilities together with Martinez; he credits his SAMSI visit with exposing him to a much wider variety of problems and methods than he would have otherwise known about, allowing him to play a much more active role in developing his thesis program. Also, Pablo spent considerable time at SAMSI exploring statistical computing environments, taking advantage of researchers’ varied experiences in many environments to learn about their strengths and weaknesses. He did calculations in R, C, Mathematica, and Python at SAMSI (he has settled on a combination of R and C for his thesis). He also learned MCMC algorithms and especially the importance of output diagnostics. As a measure of the success of the program, he notes that the closing workshop (SCMA2006) was the first scientific meeting he has attended where he felt he really understood the majority of the topics being discussed, and felt involved with the research.

103 Hyunsook Lee (Statistics, Penn State) Hyunsook is a statistics graduate student with an undergraduate background in astronomy. She attended tutorials and the astrostatistics kickoff workshop. During that time, she presented a poster, titled “Convex Hull Peeling: Nonparametric Multivariate Data Analysis.” Some other related results were being presented at Interface 2006 (Detecting Outliers in Multivariate Massive Data by Convex Hull Peeling with Applications), SCMAIV (Nonparametric Approach to Multivariate Massive Data Analysis by Convex Hull Peeling), and JSM2006 (A Nonparametric Approach to Descriptive Measures of Multivariate Massive Data Based on Convex Hull Peeling Depth). After the workshop, she joined various focused working group meetings: Exoplanets, Source and Feature Detection, Gravitational Lensing, Particle Physics, and Survey and Population Studies. She maintained the websites for the Survey and Population Studies working group, and for the Particle Physics group. She was very helpful in providing Survey and Population Studies working group astronomers with information about the strengths and weaknesses of information criteria for model selection (e.g., AIC vs. BIC), and with information about computational geometry tools. She finished her dissertation and graduated from Penn State in 2006. She was an invaluable assistant for the closing workshop SCMAIV. Feedbacks from her poster presentation at the kick off workshop were reflected in her dissertation and other later presentations. She is in the process of writing papers on model selection with a jackknife method and nonparametric massive data analysis with convex hull peeling. The first topic is of theoretical nature and the latter one focuses on developing algorithms for exploratory data analysis with some supporting theory. Finally, participating in the program as a graduate student led her to find a Postdoc position in Harvard-Smithsonian Center for Astrophysics as the only statistician among 900 researchers.

Nicholas Robbins (Mathematics, Duke) Nicholas maintained the public web-page for the Gravitational Lensing working group. He is in the early stages of his thesis work with Professor Bray. Topics covered in the lensing session may be integrated in his thesis.

Lingsong Zhang (Statistics, UNC) Lingsong is interested in multivariate outlier detection and functional data analysis using singular value decomposition. He is currently in charge of maintaining the website for the Source and Feature Detection working group, and he is also an active participant of the discussion. Lingsong had developed visualization tools for functional data, and is currently working on multi-resolution outlier detection methods for detecting outliers in long-range dependent time series, with applications in Internet anomaly detection. He is in the astrostatistics program to look for interesting astronomy applications for which he can apply his visualization tools and outlier detection methods. He is also interested in developing new methodology for challenging astronomy problems.

Brendon Brewer (Physics, University of Sydney) Brendon’s thesis research (under Prof. Geraint Lewis, an astronomer) uses Bayesian methods to address inverse problems in astronomy associated with gravitational lens and asteroseismology data. He was originally invited to participate in the gravitational lens group, but correspondence with Petters indicated that the topics the group was focusing on would not directly address his

104 research interests. However, he was very interested in learning about Bayesian and other methods employed in the SPS and Exoplanets groups. Due to his location in Australia, remote participation was not feasible, so Brendon’s participation was limited to two weeks, when he attended the SPS and Exoplanets intensive research sessions. Brendon was particularly interested in computational techniques for model selection, a topic that arose both in the SPS and Exoplanets groups. Inspired by talks he heard at SAMSI, on his return to Sydney, he pursued research on marginal likelihood methods, changing the approach he had previously taken for his work (he is presently using annealed importance sampling; related methods were pursued at SAMSI, especially by Phil Gregory). Brendon met Martin Hendry via the SPS group, and Martin invited him to the University of Glasgow to give a seminar on his thesis work. Brendon has also become interested in survey issues, particular Malmquist bias (which may play a role in analysis of gravitational lens systems). He discussed approaches to handling Malmquist bias with Loredo, Hendry and Chernoff, and hopes to pursue research on this topic after his thesis is completed.

Bodhisattva Sen (Statistics, Michigan) Bodhisattva is working with Michael Woodroofe and Moulinath Banerjee on his dissertation. A portion of his thesis will be on applications of Statistics in High Energy Physics (more specifically, on construction of confidence intervals in presence of nuisance parameters in examples that arise frequently in HEP). Bodhi attended both the opening workshop on Astrostatistics (inJanuary2006) and the intensive session on statistical issues in Particle Physics (in March2006). Michael Woodroofe presented a joint work with Bodhi Sen ”On the Unified Method with Nuisance Parameters” in the session on Particle Physics, which has now been submitted for publication in a Statistics journal.

D. Consulted Individuals

The individuals consulted for the broad selection of topics within programs and workshops were the members of two groups:

• The Program Organizers, listed in Section I.A.1.

• Members of the Advisory Committees, listed in Section I.J.

The specific topics that Program Working Groups chose to pursue were, in general, selected by the Working Group participants themselves, according to their combined interests. In almost all cases, however, a Program Leader headed each working group, so that specific research topics remained consistent with overall program goals. In Section I.E, the various Program Working Groups, and their members, are discussed.

105 E. Program Activities

1. Program on Development, Assessment and Utilization of Complex Computer Models

1.1 Introduction

Mathematical models intended for computational simulation of complex real-world processes are a crucial ingredient in virtually every field of science, engineering, medicine, business, and in everyday life as well. Cellular telephones attempt to meet a caller’s needs by optimizing a network model that adapts to local data, and people threatened by hurricanes decide whether to stay or flee depending on the predictions of a continuously updated computational model. Two related but independent phenomena have led to the near-ubiquity of models: the remarkable growth in computing power and the matching gains in algorithmic speed and accuracy. Together, these factors have vastly increased the applicability and reliability of simulation—not only by drastically reducing simulation time, thus permitting solution of larger and larger problems, but also by allowing simulation of previously intractable problems. The intellectual content of computational modeling comes from a variety of disciplines, including statistics and probability, applied mathematics, operations research, and computer science, and the application areas are also remarkably diverse. Despite this diversity of methodology and application, there are a variety of common challenges - detailed below -in developing, evaluating and using complex computer models of processes, which directly relate to the mission of SAMSI.

1.2 Program Organization

1.2.1 Subprograms

The study of computer models needs to take place in the context of actual computer models. But because of the inherent complexity of computer models, and the very different types of such models, this SAMSI program is articulated in sub-programs, focusing on specific computer modeling applications. This approach allows in-depth exploration of specific types of computer models, while maintaining an overall ’SAMSI umbrella’ that allows quick transfer of techniques developed in one sub-program to another. The following subprograms will be conducted during the year.

Environmental/Ecological/Climate Models Subprogram. The environmental modeling subprogram deals with problems and research fields at the interface between statistics and environmental modeling. These include Problems of model calibration in the presence of structural model deficits and input uncertainty, problems of decision-oriented model application under high uncertainty about model structure and parameter values, and problems of universality or transferability of environmental models.

106 This Subprogram is led by Peter Reichert (EAWAG. At the moment, it has three distinctive working groups: Air Quality, Climate and Weather, and Terrestrial Models. A fourth working group, Hydrology models, has now completely merged with the Methodology working group.

Subprogram on Uncertainty in Models of Granular Materials: Sources and Conse- quences. Granular materials are ubiquitous. This subprogram aims to develop a better understanding of the variability that appears in -indeed, often dominates -the observed behavior of granular materials during flow and deformation. The field exhibits rich yet poorly understood physics. For example, today there is no first-principles explanation of the creation and breaking of force chains; there is no theory of the propagation of sound in a granular medium. There is a need for a working theory of the behavior of granular materials that can describe practical applications, such as a theory of materials handling, of bin loading, of granular avalanches and pyroclastic flows. The segregation of granular materials, by size, shape, density or composition, is but one example of a problem of fundamental physics with enormous practical application. There is active research in physics and in several branches of engineering on questions of granular material flow and deformation. Yet in many ways the field is in its infancy, not unlike the theory of fluid dynamics at the time Navier, Boltzman, and Stokes. This Subprogram pretends to broaden and deepen the discussion among physicists, statisticians, mathematicians and engineers, in pursuing new ideas to describe granular materials. More specifically, the Subprogram is ideal for interactions of statisticians who engage: experimental physicists and engineers examining the significant role of fluctuations in granular deformation; experimental physicists and mathematicians investigating segregation of granular materials; engineers and mathematicians analyzing and computing macro-scale mathematical models of granular flow to understand uncertainty in those models. This Subprogram is led by Bruce Pitman (U. Buffalo). It has formed two working groups, one on Engineering Applications of Granular Materials, and the second on the Statistical Mechanics and Physics of granular materials.

Engineering Subprogram. The engineering subprogram studies frequently-occurring problem areas in finite-element and other engineering models. Specifically, it will focus on the problems of Validation, Calibration, and Combining Data from physical experiments and computer experiments. The emphasis is on applications where the computer models require substantial running times and the physical models are difficult or expensive, so that, in some cases, physical experiments can be conducted for only subcomponents of the desired system or a physical simulator may only be possible for the desired system. Issues of combining codes from system components to produce valid codes for the entire system can then arise. The design of both the physical and computer experiments is also be of special interest. This Subprogram is led by Tom Santner (Ohio State U.). At the moment it has one working group: Engineering Methodology

Biological Modeling Subprogram. This program will focus on three types of biological models. The first will be on models to predict cerebral blood flow. As a first step, a fluid

107 dynamic model for the Circle of Willis will be developed; improvements will be investigated, and boundary conditions will be carefully considered. Model calibration based on partial data will be undertaken. A second focus of the program will be on system biological models. Models used in systems biology range from small biochemical networks modeled with a set of coupled ODEs that can be simulated quickly on a standard PC, through to large spatio-temporal models of whole cell (or cell population) behavior run on large computing facilities. Some models are deterministic, while others are intrinsically stochastic, giving different output on each run. All typically contain uncertain parameters that must be estimated from sparse, noisy experimental data. Additionally, there is often uncertainty regarding model structure. Particular problems that arise in the context of systems biology models include: estimating large numbers of parameters from sparse data, parameter estimation using complex multivariate data, simultaneous estimation of model parameters and structure, and estimating parameters of complex stochastic models. Finally, models for the dynamics of infectious diseases will be contemplated. In particular, for the impact of drug therapy and resistance on acute viral infections. These models are based on a multi-scale approach, integrating within-host models (i.e. ones that describe (epidemiological) models that describe the spread of infection at the population level. Numerous questions exist in terms of fitting these models to data, validating the models and using them for assessment of the spread of viral infection. This Subprogram is led by Darren Wilkinson (U. of Newcastle). At the moment has three active working groups: Calibration of Computational Models of Cerebral Blood Flow, System Biology and Dynamics of infectious diseases.

Methodology Subprogram. This Subprogram engages in an in-depth treatment of methodological issues that arise in the design, analysis and utilization of computer models across many fields of application. This Subprogram evolves in close collaboration with the four disciplinary subprograms (Environmental/Ecological Models, Engineering Models, Uncertainty in Models of Granular Materials, and Biological Modeling), engaging them in an overall research umbrella. In trying to predict reality (with uncertainty bounds), some of the key issues that have arisen are: use of model approximations (emulators) as surrogates for expensive simulators, for calibration/prediction tasks and in optimization or decision support; dealing with high-dimensional input spaces; validation and utilization of computer models in situations with very little data, and/or functional (possibly multivariate) outputs; non-homogeneity, including jumps and phase changes on the output as we move around the input space; implementation and transference methodology to current practice; efficient MCMC algorithms and prior assessments; optimization and design. This Subprogram is led by M.J. Bayarri. It has one working group: Methodology, with considerable overlap with the rest of working groups. The working group for Hydrology models has entirely merged with the Methodology working group.

1.2.2 Working Groups

The active working groups are meeting weekly throughout the year to pursue their particular research topics identified in the kickoff and posterior workshops and/or

108 subsequently chosen by the working group participants. Some few working groups have their activity concentrated in a shorter period of time. The working groups consist of SAMSI visitors, postdoctoral fellows, graduate students, and local faculty and scientists. A number of working group members do not reside in SAMSI nor in the area, and take active part on the meetings via teleconferencing and Webex access. The working groups have active web pages in which material, notes, agendas and members are posted.

Air Quality. The working group is led by Serge Guillas (Georgia Institute of Technology), and it is focused on the study of Air quality computer models. Statisticians and an EPA scientist, who runs and study the Air Quality computer model CMAQ, are actively involved. The active participants are Serge Guillas (Georgia Tech), Chungsheng Ma (NCSU), Daiwen Kang (EPA), Fei Liu (Duke U.). Other eventual participants or people showing interest and signing for remote participation include Chris Paciorek (Harvard U.), Brian Reich (NCSU), Howard Bondell (NCSU), John McHenry (MCNC), Montse Fuentes (NCSU), Kristen Foley (NCSU), Denise Swall (EPA), Daiwen Kang (EPA), Steve Sain (NCAR), David Holland (EPA).

Calibration of Computational Models of Cerebral Blood Flows. The working group is led by Pierre Gremaud (NCSU). It is focused on the study and calibration of models describing cerebral blood flows. The working group consists of applied mathematicians, statisticians and physicists; it also works closely with experimentalists (group of Dr. Novak, Beth Israel Deaconess Medical Center, Harvard University). The participants are Kristen DeVault (NCSU), Pierre Gremaud (NCSU), Mette Olufsen (NCSU), Guillaume Vernieres (SAMSI postdoc/UNC) and Darren Wilkinson (Newcastle University).

Climate and Weather. The working group, led by Montserrat Fuentes (NCSU), Steve Sain (NCAR) and Jonty Rougier (U. Durham). This working group will focus on problems related to climate and weather physical models, and their use and limitations for short and long term forecast. The participants are Steve Sain (NCAR), Cari Kaufman (SAMSI), Montse Fuentes (NCSU), Jonty Rougier (U. Durham), Mike Dietze Harvard U.), Randy Sitter (SFU), James Gattiker (U. Southampton), Tsui-Long Chen (NCSU), Serge Guillas(Georgia Tech), Chunsheng Ma (NCSU), Howard Bondell (NCSU), Guillaume Vernieres (UNC), Leonard Smith (Oxford and LSE), Christine Shoemaker (Cornell U.), Nancy Nichols (U. Reading), Jim Crooks (SAMSI), Elaine Spiller (Duke U.), Dorin Drignei (U. Oakland), Justin Shows(NCSU), Howard Bondell (NCSU), Brian Reich (NCSU).

Dynamics of Infectious Diseases. This working group is led by H.T. Banks (North Carolina State University), Ariel Cintr´on-Arias (SAMSI/NCSU), and Alun Lloyd (NCSU). Its focus is on longitudinal data acquisition and analysis, population and within- host models, and statistical and mathematical methodologies. The participants are Tom Banks (NCSU), Jeff Borggaard (Virginia Tech), John Burns (Virginia Tech), Adam Childers (Virginia Tech), Ariel Cintr´on-Arias

109 (SAMSI/NCSU), Gene Cliff (Virginia Tech), Cammey Cole (Meredith College), Jim Crooks (SAMSI), Marie Davidian (NCSU), Jimena Davis (NCSU), Sava Dediu (NCSU), Stacey Ernstberger (NCSU), Sarah Grove (NCSU), Shuhua Hu (NCSU), Abdul Jarrah (Virginia Tech), Sarah Lynn Joyner (NCSU), Grace Kepler (NCSU), Reinhard Launberbacker (Virginia Tech), Alun Lloyd (NCSU), Henning Mortveit (Virginia Tech), Golnar Newbury (Virginia Tech), Betty Paredes-Alvarez (Virginia Tech), Carlos Rautenberg (Virginia Tech), Peter Reichert (SAMSI), Johnny Samuels (NCSU), Daniel Sutton (Virginia Tech), Karyn Sutton (Arizona State), Alan Veliz-Cuba (Virginia Tech), Paola Vera-Licons (Virginia Tech), Lizette Zietsman (Virginia Tech).

Engineering Methodology. The Engineering Methodology Working Group is led by Thomas Santner (the Ohio State University and SAMSI). This group works on problems that occur in physics-based computer models. Among the features that make these models unique are their long running times, the possibility of collaborating physical (or near-physical) experiments, the presence of calibration and tuning parameters, functional, multivariate as well as real-valued outputs. The participants of this working group are Don Bartel (Cornell University), Dianne Bautista (The Ohio State University), Susie Bayarri (University of Valencia), Thomas Bengtsson (Bells Laboratories), Tiangang Cui (University of Auckland), Ian Dinwoodie (Duke University), Chris Gotwalt (SAS Institute), Genetha Gray (SANDI Laboratories), Eitan Greenshtein (N. Carolina State University/SAMSI), Gang Han (The Ohio State University), Dave Higdon (Los Alamos National Laboratories), Ying Hung (Georgia Tech), Herbie Lee (UC-Santa Cruz), Simon Lunagomez (Duke University/SAMSI), Abhyuday Mandal (University of Georgia), Scott Mitchell (Sandia Laboratories), Max Morris (Iowa State University), Nancy Nichols (Reading University), Abani Patra (University of Buffalo), Angie Patterson (General Electric), Mark Perry (LSE) Bruce Pitman (University of Buffalo), Grant Reinman (Pratt & Whitney), Jerry Sacks (NISS), Thomas Santner (The Ohio State University), Randy Sitter (Simon Fraser University), Curtis Storlie (NC State University), Laura Swiler (Sandia Laboratories), Matt Taddy (UC at Santa Cruz), Shih-Chung Tsai (GM Corporation), Gentry White (NC State University), Henry Wynn (LSE), Thanasis Kottas (UC at Santa Cruz), Bruno Sanso (UC at Santa Cruz), Dorin Drignei (Oakland University), Blaza Toman (NIST).

Granular Materials -Engineering Applications. This working group is led by Bruce Pitman (U. Buffalo). This group is interested in implementing Bayesian methodology to applications involving granular materials. One of the principal applications is the hazard risk assessment of landslides or granular avalanches. Another application is to the forces and loads on the walls of a bin or hopper that stores granular material. The group on engineering applications includes Susie Bayarri (U. Valencia/SAMSI), J. Berger (SAMSI), Dorin Drignei (Oakland U), Michael Goldstein (Durham U.), Nancy Nichols (Reading U.), Abani Patra (U. Buffalo), Luis Pericchi (U Puerto Rico), Bruce Pitman (U. Buffalo), Michael Shearer (NCSU), Robert Wolpert (Duke), and post-doc Elaine Spiller (Duke) and students Jennifer Joyce (UNC) and Simon Luna-Gomez (Duke); student Dalbey participates from Buffalo, and faculty

110 member Calder in Buffalo contributes regarding the specific application to volcanic eruptions at Montserrat.

Inference and Uncertainty Analysis of Hydrological Models. The working group, led by Peter Reichert (Eawag), focuses on calibration and uncertainty analysis of hydrological watershed models. A simple hydrological model is used to be more efficient in developing techniques designed for more complex (and slow) hydrological and, more generally, ecological, models. Three main topics are approached: (a) bias reduction with time-dependent parameters; (b) increasing the efficiency of time-dependent parameter estimation; (c) emulating hydrological models. More details are given under Research Goals and Activities. Due to the large number of working groups and the methodological focus of the work, the group merged with the methodological working group.

Methodology. The working group led by Susie Bayarri (U. Valencia/SAMSI) and Robert Wolpert (Duke) will work across multiple working groups to identify methodological issues deserving investigation, to work on the calibration and validation of models, as well as to address challenging issues in the implementation of large and complex models with multiple sources of uncertainty. Participants are Tony O’Hagan (Sheffield U.), Fei Liu (Duke), Cari Kaufman (SAMSI), Eitan Greeshtein (SAMSI), Tiangang Cui (Auckland U.), Herbie Lee (UCSC), Genetha Gray (Sandia), Simon Lunagomez (Duke), Tom Loredo (Cornell), Curtis Storlie (NCSC), Christine Shoemaker (Cornell), Nancy Nichols (Reading), Dianne Bautista (OSU), Leanna House (Durham U.), Michael Goldstein (Durham U.), Henry Wynn (LSE), Max Morris (Iowa State U.), Gang Han (OSU), Darren Wilkinson (Newcastle U.), Elaine Spiller (Duke), James Crooks (SAMSI), Mark Huber (Duke), Leonard Smith (Oxford and LSE), Sunyoung Bu (UNC), Ariel Cintron-Arias (SAMSI), Tsuei-Long Chen (NCSU), Chunsheng Ma (NCSU), Rui Paulo (Polytechnic U. Lisbon), Luis Pericchi (U. Puerto Rico), Ying Hung (Georgia Tech), Matt Taddy (UCSC), Abani Patra (U. Buffalo), Serge Guillas (Georgia Tech), Tom Santner (OSU), Gentry White (NCSU), Bruce Pitman (U. Buffalo), Zhiguang Qian (U. Wisconsin), Susie Bayarri (U. Valencia/SAMSI), Robert Wolpert (Duke), Jonathan Rougier (Bristol U.), David M. Steinberg (Tel Aviv U.)

Statistical Mechanics of Granular Flow. The working group is led by Sorin Mitran (UNC). This working group is interested in applying statistical and statistical mechanical approaches to new discoveries in the physics of granular materials. Using simulation techniques to complement experiments, the topics under investigation include stresses on particles in a simple Couette cell assembly, and the creation, propagation, and break-up of void regions in shear cells. The statistical physics working group includes Jennifer Joyce (UNC), Tom Loredo (Cornell U.), Bob Behringer (Duke), Peter Mucha (UNC), Karen Daniels (NCSU), Bruce Pitman (U. Buffalo), Sorin Mitran (UNC).

Systems Biology Models -Parameter Estimation. The working group led by Darren Wilkinson (Newcastle U.) is working on one of the key mathematical and statistical

111 challenges that has arisen in the exciting new scientific discipline of Systems Biology, namely, parameter estimation for deterministic and stochastic biochemical network models. The main focus of the WG is in the spring of 2007, and therefore has only recently been formed.

Terrestrial Models. This working group is led by Jim Clark (Duke). It concentrates in linking seasonal climate/weather variation to models of forest dynamics, including gaining understanding in how this intermediate scale variation in climate affects biodiversity and ecosystem processes. Participants are Jim Clark (Duke), Jerry Sacks (NISS), Steve Sain (NCAR), Cari Kaufman (SAMSI), Sean McMahon (Duke), Mike Dietze Harvard), Jim Crooks (SAMSI), Benoit Courbaud (Duke), Robert Wolpert (Duke), Chris Paciorek (Harvard), Fei Liu (Duke), Susie Bayarri (U. Valencia/SAMSI)

1.3 Research Goals and Activities

1.3.1 Program Level

Apart from the goals specified in each of the working groups, unified goals of the overall program are to i) promote interaction between the different working groups, ii) gain understanding and widen the possible goals by exposing researchers in a very specific area/models to similar problems and goals of related models, iii) foster collaboration between modelers, statisticians, applied mathematicians and scientists, iv) promote active participation of new researchers and students. These goals are nicely being achieved, and are clearly reflected in working groups with large overlapping, denoting the widened interest of researchers, and miscellaneous mix of researchers from the different areas as well as of senior and new researchers. Most working groups are a good mix of mathematicians, modelers, statisticians, and scientist from specific areas, whether attending in person or remotely. This has resulted in a extraordinary enrichment of all involved: learning simultaneously about the characteristic of science behind the computer model, the mathematics/engineering translating science into models, the numerical implementation, and the statistical learning from field data is an extremely instructive exercise, in which the limitations and potentials of each component of the global use of the model is exposed and analyzed, not in isolation, but in connection with all the other components. This is the best (maybe the only) scientific road for assessment of existing models, development of better models, and adequate utilization of them. SAMSI goal of promoting interaction and achieve results not possible otherwise is exceptionally achieved and demonstrated in this Program; SAMSI is playing an instrumental role in the exciting achievements of the working groups. Simultaneously, also in virtually all working groups, there is a good mix of senior people, post-docs and students. It is remarkable that this Program has attracted exceptionally good participants; all Sub-programs are having the participation of indisputable top leaders in their areas. No less important is the representations of extremely good post-docs and graduate students. The mix is resulting in very vital and motivating working groups, with the exciting activities detailed bellow.

112 The participants are also a good mix in geographical terms; most participants are non locals, coming mainly from all over USA, but also with an important representation from Europe. In similar terms, minorities, in particular women are very well represented. The Program Leaders have strived to have women representation in all working groups, workshops and activities. In particular, when invitations have been issued (for participation in different activities), adequate representation (senior/junior, geographical and minorities) have been actively pursued. Computer Models is an area where extremely few senior, influential women are active at the moment, and most of them have been asked to participate (obviously, not all of them could make it, although all showed a lot of interest in the Program). Even with this restriction, the Program, has achieved a good representation of minorities in all its activities. We next briefly describe the goals, activities and achievements of the different working groups. More detailed information appears in the respective web pages.

1.3.2 Air Quality

1.3.2.1 Introduction. Research Goals

After the kick-off workshop, and one meeting, the working group had identified some important issues for research:

• Model calibration. For these models, no systematic studies of the tuning parameters have been done. Some sensitivity analyzes are available, and some statisticians in the group will use them at first. There is a need to run the models under various parameterizations, following a design of experiment, to find good parameterizations. We plan to use the Bayesian calibration in this framework. The statistical issues to be addressed this year are the nature of the outputs (space- time), the nature of the “controllable” inputs (weather), and the nature of the field data (1000 EPA stations scattered over the US, with a very specific distribution).

• Model downscaling. Can a statistical help with downscaling rather than forecasting a very high resolution? It clearly depends on what resolution is used for the inputs, and the parameterizations done. We will try to investigate two types of statistical approaches: regression models using local ancillary variables, and spatial (or spatio-temporal) modelling of the outputs, in order to predict pollution levels at a sub-grid scale.

As a longer term goal, we intend to examine the problem of human exposure. The central question is how to use air quality model outputs for public health research, specifically epidemiological work.

113 1.3.2.2 Specific Activities.

The Air Quality working group held seven meetings over the Fall of 2006. A web page

http://www.samsi.info/200607/compmod/workinggroup/air/index.html

describes the topics covered and some presentations. We discussed calibration of an air quality model. S. Guillas and colleagues at Georgia Tech then run this model for a design consisting of 100 runs based on 5 parameters: diffusion (boundary layer height), NOx, anthropogenic VOC, biogenic isoprene, effects of clouds on photolysis. The time period is one week: July 24-31, 2005. Observations from one station in Atlanta is used to calibrate, based on data at 2pm each day (around the peak time). It turns out that NOx and effects of clouds on photolysis ought to be tuned differently than they were in the model. The results are preliminary, since a full study, using the whole summer of 2005 will be performed at Georgia Tech after computers and data storage are purchased. S. Guillas obtained a College of Sciences Faculty Research Developement Grant from Georgia Tech to buy such equipment. The question of the calibration of functional outputs is still under investigation, with recent advances by Fei Liu. This research is new since no calibration of an air quality model had been performed in the past, and hence it could well turn into a “research nugget”. On the topic of downscaling, our work is now submitted and three studies are under way: 1. S. Guillas, J. Bao, Y. Choi, Y. Wang. Downscaling of chemical transport model ozone forecasts over Atlanta. A two step regression approach for time series (no spatial component) is carried out. Deficiencies of 3-D model results are identified and corrected. Evaluation using measurements for a different period confirms that the statistically-adjusted outputs reduce forecast errors by up to 25%.

2. S. Guillas, Chunsheng Ma. Space-time downscaling of regional ozone forecasts with nonseparable covariance models. In this work we introduce a new type of space-time covariances that enable us to better fit the differences between model outputs and observations. The resulting downscaling should improve on the result for time series.

3. S. Guillas, A. Gelfand and S. Sahu, Bayesian downscaling of an air quality model. In this work, uncertainties are naturally assessed by the Bayesian approach. Meteorological variables are used to locally improve the regional forecasts, using strong priors to give preference to the chemistry transport model.

Dana Draghicescu (CUNY-Hunter College) was invited by the working group to give a presentation on modeling and prediction of probability distribution functions and quantiles for space-time environmental processes. A collaboration between S. Guillas and Dana Draghicescu has started on the topic of quantile maps for air quality assessment.

114 1.3.3 Calibration of Computational Models of Cerebral Blood Flows

1.3.3.1 Introduction. Research Goals

The long term goal of the effort is to explain the autoregulation mechanisms taking place in the vascular system in response to outside stimuli (such as posture changes from sitting to standing for instance). The group has focused its attention on one particular subsystem of the cerebral vasculature: the Circle of Willis. This network of about sixteen vessels plays a key regulation role. Further, its topology may be different from patient to patient. It is expected that an improved understanding of the mechanical properties, both fluid and elastic, of this subnetwork will also lead to better predictive capabilities regarding risks of strokes. After the kick-off workshop, the working group has identified some important issues for initial research: • Model selection. A lot of recent research in hemodynamics has focused on detailed model-ing/calculation of localized features of blood flows such as for instance flows in the vicinity of aneurysms. The present goal is quite different. A more systemic and global approach has to be considered in order to take into account circulation in the entire Circle of Willis. Further, the model has to be simple enough to allow patient dependent topological changes to be easily taken into account, ruling out full three- dimensional approximations. A pseudo one-dimensional approach has been chosen in which blood is considered as a non-Newtonian fluid and visco-elastic reactions of the vessels are taken into account.

• Mathematical and numerical issues. The mathematical structure of the equations corresponds to systems of nonlinear hyperbolic balance laws which are linked, from vessel to vessel, by boundary conditions. This highly nonstandard type of problems has recently received a lot of attention in other fields (communication networks, traffic flows, pipeline network management, etc...). Pseudo-spectral methods are used as solvers.

• Model calibration. One the main goals of the working group is model calibration. A key point that differentiates our work from that of other groups in this field is our direct access to high quality data through collaboration with Dr. Novak’s group. Ensemble Kalman filtering techniques are being used for calibration purposes as several material properties are not accessible to measurement. Preliminary results are very promising.

1.3.3.2 Specific Activities.

Some specific activities for the working group include: o Post-processing of raw data from Dr. Novak’s group. Due to the measurement technique, flow properties (namely velocity) can only be recorded one vessel at a

115 time. The set of available data is thus not synchronized. Various statistical issues linked to this fact are being investigated. o Choice of a model calibration method. So far, all calculations have involved ensemble Kalman filtering techniques. A discussion has been started regarding the suitability of other possible methods. o Boundary conditions. The conditions at the boundary between the part of the vascular network that is modeled (Circle of Willis) and the rest of the network are delicate. A comparison of two different ad hoc models is under way. The suitability of more complex conditions (derived from first principles) is under discussion.

1.3.4 Climate and Weather

Numerical models are vital to simulate geophysical, chemical and ecological processes and to understand the relationship among components in the Earth system. As models have become larger and more complex, their construction, validation and analysis are no longer amenable to simple approaches and statistical summaries. Statistical science in the past 20 years has advanced to handle the interpretation of complicated multivariate, spatial and temporal data sets and it is well suited to tackle the massive outputs from numerical experiments that are now the norm in the geosciences. This SAMSI working group in partnership with the National Center for Atmospheric Research (NCAR) is undertaken with the goal of matching cutting edge statistical methods to the needs of geophysical model development and to make statistical scientists aware of the particular scientific issues and research in the geophysical modeling community.

1.3.4.1 Introduction. Research Goals.

An exciting development for this working group is the close collaboration between statisticians, applied mathematicians and NCAR modelers that will result in a better understating and characterization of uncertainty in climate and weather models. In particular this working work will address the characterization and handling of the uncertainty in these models, problems of decision-oriented model applications, and improvement of weather and climate forecast. The first few meetings of this working group are been held jointly with the Climate and Weather working group of the SAMSI Fall Program on High Dimensional Inference and Random Matrices; they are mainly devoted to identifying and working on key references in the literature. Following the Joint SAMSI-NCAR Workshop in November, the activities will be more focused on specific goals and models. More specifically, this working group will work on • Stochastic parameterization of WRF-1D • Data assimilation for WFF-1D • Evaluation of physical models • Comparison of regional climate models • Turbulance modelling

116 1.3.4.2 Specific Activities

There are five working subgroups doing research on the five scientific problems described below. The groups have weekly meetings, and a joint monthly meeting with all the members of the climate modeling working group.

Project 1: Stochastic parameterization of WRF-1D NCAR scientists: Robert Tardif and Josh Hacker SAMSI leader: Fuentes People involved from SAMSI: Justin Shows, Howard Bondell

The objective of this work is to build a stochastic model representing the temporal variability in the characteristics of clouds (cloud height and amount of water in the cloud), based on the statistical properties of observed clouds at the Department of Energy (DOE) Atmospheric Radiation Measurement(ARM) instrumented site in the Southern Great Plains (SGP). The model will subsequently be used to provide input to the WRF- 1D atmospheric model to study the response of the modeled atmospheric boundary layer to stochastic cloud radiative forcing.

Project 2: Data assimilation for WFF-1D NCAR scientists: Josh Hacker, Jeff Anderson SAMSI leader: Guillaume Vernieres People involved: Fuentes, Elaine Spiller

The WRF-1d (Weather Research Forecast) is a column model derived from the 3d WRF. One of its purpose is to estimate a few parameters than can be used in the 3d WRF for the purpose of forecasting and nowcasting. We propose to estimate the uncertainties in the parameters, dynamical parameter equations and the dynamic of the WRF-1d model using wind speed and temperature observation a 10 meter height. The code for the wrf-1d was migrated to the Topsail cluster at UNC. The ensemble filtering is being developed using MPI and will be design to use the 128 nodes that we have access to, the idea being to use a large ensemble for a better representation of the pdf’s associated with the optimum parameters.

Project 3: Evaluation of physical models NCAR people: Steve Sain Shane R. SAMSI leader: Serge guillas [email protected] Other collaborators: Jonty Rougier

TIEGCM is a simulator of the processes in the upper-atmosphere. In our experiment, a single TIEGCM evaluation computes the daily response of magnetic perturbations at the ground (H,D, and Z component), and the up-/poleward ExB drift velocity, at specified sites above the surface of the earth. We have an initial ensemble of 30 evaluations in which three parameters have been varied in a maximin latin hypercube design. These parameters are the amplitude of the migrating tide, the phase of the tide, and the minimum electron density.

117 Our objective is to calibrate up to seven of the parameters, including the three mentioned above. Our initial ensemble does not span the observational data we have at each site, suggesting that we need to increase the range over which the three parameters are varied, and/or vary the remaining four parameters. To this end, our first task is to emulate the TIEGCM response to the three model parameters. This will allow us to judge whether moving these parameters beyond their current limits will improve the calibration, and, in conjunction with the modelers, whether such a move is ’physical’. Emulating the simulator response poses a number of interesting and unusual challenges. The outputs at a given site are a periodic function of time. Similarly, the emulator’s response at different site must conform to the geometry of the surface of the earth. One plan to address this is to use regressors that are a tensor product of Fourier and Spherical Harmonics basis functions, the latter with higher order in the zonal terms to account for the stronger effect of latitude. This is quite a demanding task, particularly if we attempt to impose the same kind of structure on the emulator residual process. However, it has the advantage of combining information from multiple sites (up to twenty five); there is clearly some smoothness in the TIEGCM outputs. An alternative plan is to emulate the TIEGCM response to the model-parameters at each site, which avoids the need for spherical harmonics, and might be able to take advantage of functional modelling for the time-response. We are pursuing both of these plans.

Project 4: Comparison of regional climate models NCAR: Steve Sain SAMSI leader: Cari Kaufman Others: Brian Reich, Howard Bondell, Jonty Rougier

The NARCCAP project is running a variety of 50km resolution regional climate models (RCMs) for North America, with boundary conditions provided by a variety of global climate models (GCMs) and reanalysis datasets (interpolated data products based on observations). This experimental design lends itself to a functional ANOVA approach, in which the output response can be decomposed into main effects and interactions. This technique allows to partition variability in the model response as a diagnostic technique for comparing models. The SAMSI group working on this project has obtained data from a similar project giving temperature output over the UK in a 2 RCM x 2 GCM x 2 C02 scenario experiment. Some preliminary analysis, collapsing over space, shows that the largest source of variability is C02 scenario, followed by GCM, followed by RCM. That is, the two RCMs have a high degree of agreement, given a particular GCM and scenario. This group is extending this analysis to model spatial surfaces rather than overall means. Currently the group is exploring 1-way functional ANOVA modelling using Gaussian process priors on the mean functions using simulated data, and studying the connections between this method and existing methods using splines.

Project 5: Turbulence modelling NCAR: Pablo Mininni SAMSI leader: Chunsheng Ma This group is developing log Gaussian frameworks for turbulence modelling.

118

Some papers have been submitted during the Program or are under preparation:

• Hacker, J. P., J. L. Anderson, and M. Pagowski, 2007: Improved vertical covariance estimates for ensemble-filter assimilation of near-surface observations. Mon. Wea. Rev., 135, 10211036.

• Hacker, J. P. and D. Rostkier-Edelstein, 2007: PBL state estimation with surface observations, a column model, and an ensemble filter. Mon. Wea. Rev., accepted.

• Kaufman, C. and Sain, S. Functional ANOVA modeling of regional climate model experiments (manuscript in preparation).

• Shows, Fuentes, Bondell, Hacker and Tardif. Stochastic parameterization of WRF-1D (manuscript in preparation).

1.3.5 Dynamics of Infectious Diseases

1.3.5.1 Specific Activities

This working group has scheduled four monthly meetings during the following dates: January 15, 2007 held in the Center for Research in Scientific Computation. February 16, 2007 held in SAMSI. March 16, 2007 to be held in SAMSI. April 13, 2007 to be held in SAMSI.

On Jan. 15, we had an organization meeting and decided on the following three monthly dates. During the second part of the morning we had a hands-on exploration of several public sources of epidemiological longitudinal data including: Centers for Disease Control and Prevention, European Influenza Surveillance Scheme, Canadian Centre for Infectious Disease Prevention and Control, Sentinelle Network and Sentiweb, and World Health Organization. Various links to access these databases can be found in: http://www4.ncsu.edu/acintro/samsi_dyn_infc/data_sources.html In addition, two discussions were held on Jan. 15 conveying population as well as within-host dynamics with the following titles and facilitators:

1) “Introduction to deterministic epidemiological models: linear stability analysis, basic reproductive numbers, and sensitivity analysis” led by Ariel Cintron-Arias. 2) “Introduction to within-host dynamics: estimates of reproductive numbers obtained from patient viral load counts” led by Alun Lloyd.

During the Feb 16’s meeting there were five discussions addressing gene regulatory networks, algebraic models, graph dynamical systems, estimation of influenza

119 reproductive numbers, and demographic stochasticity in epidemic models with the following titles:

1. ”Pathosystems biology: an introduction to discrete models of cellular networks” led by Rein-hard Laubenbacher. 2. ”Polynomial dynamical systems” facilitated by Abdul Salam Jarrah. 3. ”SIR Models over graphs, graph dynamical systems and the EpiSims project” led by Henning S. Mortveit. 4. ”Estimation of seasonal effective reproductive numbers of influenza A(H3N2)” led by Ariel Cintron-Arias. 5. ”Demographic stochasticity in epidemic models” facilitated by Alun Lloyd.

1.3.6 Engineering Methodology

As applied mathematical models of physical phenomenon become more sophisticated, it is an increasing common practice for engineering decisions to be made either with the assistance of such models or even based solely on the implementation of these mathematical models in (complex) computer codes. Such codes are termed “computer models” and, depending on their complexity, can be expensive-to-run.

1.3.6.1 Research Goals.

There are many important issues that must be addressed before the computer model output can be used for engineering decisions. First, the computer code model must be validated meaning that it has been verified that the computer code correctly solves the mathematical model over some range of operating conditions; thus users can have faith that the computer code correctly solves the mathematical model for the phenomenon of interest. Second, information about unknown model parameters must collected in the form of prior distributions about these parameters. Third, calibration of those parameters in the code that are present in a physical experiment for the same phenomenon must be performed and the numerical tuning parameters in the computer code must be set. Finally, engineering goals can be addressed based on the calibrated code. Important examples of such goals are:

• Screening for high-dimensional inputs. How does one screen large numbers of inputs to identify active factors? How do we design the computer experiment to facilitate this screening/ How does one perform emulation based on highly correlated real- valued or functional output? For example, in the latter issue, one might simply reduce the number of training data points used to predict. How does one choose these?

• Hierarchical Validation. How does one do full-system validation based on a set of sub-system model validations? This is critical when full-system data is rare or impossible to collect but sub-system data is available.

• Calibration of Computer Codes. In functional cases one reduces dimensionality by expressing computer model output using a basis function. There are several proposals

120

• Validation of Computer Codes. Are there better choices of calibration models that make the tuning parameter and computer model bias terms more orthogonal than that proposed by Kennedy and O’Hagan (2001). These quantities are highly correlated and can be difficult for MCMC methods to accurately predict.

1.3.6.2 Specific Activities

The Engineering Methodology working group has focused on developing dynamic model approximations to computer codes of coupled systems. One application of such a model is in coupled differential equation model of the water output from a watershed area. Another potential application area is to describe the deflection of the individual blades a coupled system of coupled rotor fan blades. The use of these models as competing emulators to standard GASP emulators, their calibration, and their accuracy as approximators to the complex computer model are all important points of research. Another project that was completed during the period of support was to the biomechanical design of acetabular cups that are used in hip prosthesis replacements. The goal was to design these devices to be robust to surgical variability in their insertion and to variability in patient bone quality and to variability in loading the hip. Other engineering projects have undertaken by members of the hydrology group and individual graduate students associated with this program (see additional information regarding Gang Han, Dianne Bautista, and the Hydrology working group). Some papers have been submitted during the Program or are under preparation:

Submitted: • B. J. Williams, T. J. Santner, W. I. Notz, J. S. Lehman (2006) “Sequential Design of Computer Experiments for Constrained Optimization.” • K. L. Ong, T. J. Santner, and D. L. Bartel (2006) “Robust Design for Acetabular Cup Stability Accounting for Patient and Surgical Variability.”

In Preparation: • Gang Han, T. J. Santner, W. I. Notz, and D. L. Bartel “Prediction for Computer Experiments Having Quantitative and Qualitative Input Variables.”

1.3.7 Granular Materials -Engineering Applications

1.3.7.1 Introduction. Research Goals.

The working group has identified the following main global goals which are being actively pursued during the year:

121

• Experimental design, calibration and forecasting of granular avalanches. Preliminary focus was on using the Bayes linear methodology, but other methodologies have been explored in collaboration with the Methodology working group. Other Applications, including stress loads and flows in bins and hoppers, will be considered.

• Extreme events. Included in this work is a focus on extreme events -those large but rare pyroclastic flows and lahars that cause widespread devastation -as well as developing quantitative statistical tools that inform hazard assessment.

• Model comparison and evaluation. Interfacing with the methodology group, an important goal of this working group is to develop a rational basis for comparing models of avalanches or bin loads and flows that include different physics as well as other phenomenological modeling factors, to yield an integrated modeling effort that offers better predictive capabilities.

There are several measures of success in this working group – successful application of a Bayesian approach to the issues of computational experimental design, calibration and forecasting of avalanches/pyroclastic flows; application of statistical ideas on variation and uncertainty to engineering problems of stresses on and flows inside bins and hoppers. Papers and presentations on these ideas give specific metrics.

1.3.7.2 Specific Activities

During the course of the year, the group gravitated towards the Soufriere Hills volcano on Montserrat as a source for data. This volcano began its current eruption cycle in 1995, and has produced a few very large flows and many -literally hundreds -of smaller flow events. These eruptions have caused dozens of fatalities, and the evacuation of the southern third of the island, including the capital, Plymouth. Eliza Calder, a geologist at Buffalo, has collected and analyzed this data, and shared her expertise with us along the way. In January, 2007, a new dome bubble began to grow at the summit of the volcano, and geo-scientists anticipate another large eruption soon. Flow simulations at Montserrat, using the most recent digital elevation data, and with an eye toward developing revised hazard map, are being developed. The following specific activities, corresponding to some of the goals above, are being pursued during the year:

• Experimental design, calibration and forecasting. One research goal for the group is in the application of Bayes linear methodology for experiment design, calibration and forecasting of granular avalanches. Dalbey has extended earlier work by Patra, Dalbey and Pitman, to develop an adaptive methodology for constructing an emulator. Comparison with Gaussian Process work by Spiller demonstrates the strengths and weaknesses of both approaches.

122 • Extreme events. Another goal is an examination of extreme events to inform hazard assessment. Spiller is investigating ‘rare events’ in the context of mass flows at Montserrat, an active volcano that is erupting regularly. In related work, Wolpert and Luna-Gomez have developed a model of eruptions as Levy processes, that allows sampling directly from extreme events while at the same time computing the probability that these extreme events occur in a given period of time.

• Sampling key parameters. Pitman and Patra are developing analytic and computational approaches to sampling some of the principal parameters in mass flow models, in particular, friction angles.

Papers about these four research activities are in various stages of development. The group is participating in the April SAMSI/MUCM workshop. A presentation on a subset of these research activities will be made to the ASA meeting in July.

1.3.8 Inference and Uncertainty Analysis of Hydrological Models

1.3.8.1 Introduction

A major problem in hydrological, and -with increasing temporal resolution of data -also in ecological models, are systematic deviations of the results of predictions of deterministic models from data. This invalidates simple statistical assumptions on the distribution of the error term and makes uncertainty estimates of inferred model parameters and derived model results unreliable. Systematic measurement errors due to miscalibration of river flow gauges are usually not a main contributor to these problems. The major causes are (i) input uncertainty, (ii) model structure deficits, and (iii) the inadequateness of a deterministic description of the hydrological system.

1.3.8.2 Research Goals

We divide our activities into three (partially overlapping) research areas. (a) bias reduction with time-dependent parameters. The introduction of a statistical description model bias term or model inadequacy function (Kennedy and O’Hagan, 2001) makes it possible to consider model bias explicitly, to improve the reliability of parameter estimates (as the statistical assumptions are no longer severely violated), and it provides limited support for analyzing the cause of the bias. However, as it focuses on model output exclusively and does not make the attempt of describing an intrinsic mechanism of the system described by the model, it is hard to use for a mechanistic identification of the cause of the model structure deficit and for extrapolation. This approach was extended recently for time-dependent models (Tomassini et al., submitted) by attempting to model the bias intrinsically in the model by making selected parameters a time-dependent stochastic process. An analysis of the identified time dependence of the parameter (by trying to relate it to external and internal model variables) can facilitate the analysis of mechanistic causes of model bias. The unexplained contribution to the bias can be considered by incorporating the stochastic

123 process for the parameter that is most likely not to be deterministic in the model formulation. In both of these contributions can correctly be identified, this can be expected to lead to improved model predictions as compared to a bias term in model output. In this research field of the SAMSI program we are implementing a simple hydrological model that serves as a research tool to explore this methodology having applications to more complex hydrological watershed models in mind. (b) increasing the efficiency of time-dependent parameter estimation. The disadvantage of the estimation procedure for time-dependent parameters used in research area (a) described above is its relatively poor performance. For complex computer models, it will be hard to apply any Markov Chain Monte Carlo procedure, as it will not be possible to perform a sufficient number of simulations the get a reasonable representation of the posterior distribution. For this reason, in this research field of the SAMSI program, in strong collaboration with the working group on methodology, we are investigating linearization approaches that are hoped to improve the performance of the algorithms (see research field (c) below for an alternative approach to problem solution). (c) emulating hydrological models. Statistical emulators have been proposed to make Bayesian analysis possible for very computationally expensive simulation programs (Currin, et al., 1991; Kennedy and O’Hagan, 2001; Santner et al., 2003). However, only few attempts have been made so far, to apply this approach to dynamic models for which relatively dense time series can be produced. In this research field of the SAMSI program, together with the working group on engineering methodology, we make the attempt to construct an emulator for the simple hydrological simulation model mentioned above. The idea is to use a simple linear, discrete-time state-space model to emulate the discrete output set of the continuous-time hydrological model.

1.3.8.3 Research Activities and Current State of Work

We divide the progress into the research field introduced above. (a) bias reduction with time-dependent parameters. Most of the work has been done for this research area so far. The simple hydrological model was implemented and a first analysis has been performed for all of the model parameters individually. The preliminary results show that there are significant differences in the potential of different parameters to “explain” the model bias. This demonstrates that the statistical technique has the potential to support the analyst in identifying the mechanistic cause. Only very weak correlations were found between time-dependent parameters and internal and external model variables. This indicates that the potential for improvement of the mechanistic model structure is not very large in this particular application. Probably most of the model bias is caused by stochastic processes not described by the deterministic model. It seems reasonable to attribute most of this stochasticity to input uncertainty due to total amount and spatial distribution of precipitation. For this research field, a first draft of a paper is ready that is assumed to be submitted for review in June. (b) increasing the efficiency of time-dependent parameter estimation. In collaboration with the working group on methodology, first attempts have been made to compare the complete nonlinear approach with a linearized approach that combines linearization with emulation for a simple algebraic model. See the report of the

124 methodology working group on this topic. A slightly different approach without emulation has been discussed but not yet implemented. (c) emulating hydrological models. A first formulation of a simple linear, discrete-time state-space model to emulate the discrete output set of the continuous-time hydrological model has been proposed in collaboration with the engineering methodology working group. See the report of this working group for more details. Part of the research appears in the following draft: Reichert, P. and Mieleitner, J., Analyzing input and structural uncertainty of a hydrological model with stochastic time- dependent parameters, draft.

References

• Kennedy, M.C., O’Hagan, A., Bayesian calibration of computer models, Journal of the Royal Statistical Society, Series B, 63(3), 425-464, 2001. • Kuczera, G., Kavetski, D., Franks, S., Thyer, M., Towards a Bayesian total error analysis of conceptual rainfall-runoff models: Characterizing model error using storm-dependent parameters, Journal of Hydrology, in press. • O’Hagen, A., Curve fitting and optimal design for prediction, Journal of the Royal Statistical Society, Series B, 40(1), 1-42, 1978. • Santner, T.J., Williams, B.J., Notz, W.I., The Design and Analysis of Computer Experiments, Springer, New York, 2003. • Tomassini, L., Reichert, P., K¨unsch, H.-R., Buser, C.H., Borsuk, M.E., A smoothing algorithm for estimating stochastic, continuous-time model parameters and an application to a simple climate model, submitted. • Vrugt, J.A., Diks, C.G.H., Gupta, H.V., Bouten, W., Verstraten, J.M., Improved treatment of uncertainty in hydrologic modeling: Combining the strengths of global optimization and data assimilation, Water Resources Research 41, W01017, doi:10.1029/2004WR003059, 2005.

1.3.9 Methodology

The Methodology working-group continues to address a range of theoretical and computational issues that arise in the development, assessment, and utilization of complex computer models— especially at obstacles or opportunities for improvement encountered by more than one of the other working groups. Short presentations from a range of participants and guests were used as catalysts to help focus attention and creative efforts at emerging areas of opportunity for advancing this computational methodology.

1.3.9.1 Introduction. Research Goals.

The Methodology subgroup has identified several goals that would like to address:

• Design for both computer model runs and proposed aquisition of field data. In low- dimensional spaces we have a clear idea of the risks and benefits of traditional ‘space filling’ designs (such as Latin hypercubes) or, conversely, of designs favoring more

125 ‘extreme values’ of model inputs; the latter offer (seemingly) more precise determination of model parameters, while the former offer better opportunities for assessing model fit and accurate reporting of uncertainty. In higher dimensional spaces our intuition and abilities to visualize are weaker, and technical obstacles interfere with our ability to balance the twin desires to spread out data input points and to ‘fill in’ the model space; there is a need for more experience with different design strategies, particularly in high dimensional spaces.

A second goal is that of developing sequential designs, both by “early bail-out’ strategies based on rapid identification of unfeasible regions, and the more challenging goal of generating particularly good sequential designs to guide the choice of subsequent observations in applications where observations (either field or very costly simulators) are expensive.

• Emulators. Since computer models are usually very costly to run, an ‘emulator’ (fast simulator, surrogator) is required, usually for optimization, design and statistical analyses (calibration, validation, prediction, etc.). Traditionally, GASP (Gaussian separable process) models with simple isotropic covariance structures developed in the geostatistical sciences are the ‘default’ choice, but serious comparison with alternative processes and study of the limitations of GASPs is still lacking. Another promising possibility is use of ‘rougher’ or ‘simpler’ computer models, or a combination of both.

• Huge spaces offer big challenges, for both input and data spaces. This includes functional input/data spaces. One obvious difficulty with the ‘traditional’ GASP is that it does not scale up to even moderate dimensions, so either imaginative numerical methods have to be used, or dimension reduction strategies have to be adopted or another emulator has to used in these scenarios.

• Bayesian approaches to Calibration, Validation and Prediction A main use of computer models is to predict reality in the presence of (usually very large) uncertainties. Predictions should always be made along with ‘confidence’ bands. Also, calibration and validation of computer models have to be performed simultaneously. All these issues are best addressed through a Bayesian approach with and input-dependent discrepancy or ‘bias’ function, which naturally incorporates all the uncertainties. This is indeed the prevalent methodology since the Kennedy and O’Hagan (2001) paper. However, the confounding between unknown parameters in model inputs and the bias function has not been openly recognized and addressed till recently. This confounding airs many issues of implementation (appropriate priors), numerical analysis (issues on MCMC mixing and convergence), and interpretation and reporting of the results.

• Prediction for untried (or altered) scenarios. Using computer models for prediction in situations in which data is lacking or very scarce lies at the very heart of development of computer simulators. Exactly how to best extrapolate models and bias is an important area of research that is not yet solved satisfactorily. A combination of

126 expert judgement, statistical insights and modelers knowledge will be needed for successful extrapolation.

• Multivariate outputs. Computer models output are multivariate in nature. Usually, each output dimension is addressed separately, but sometime simultaneous consideration of multivariate output is required. GASPs and other generalizations, as well as hierarchical Bayes implementations can be utilized and should be investigated.

• Approximations. Because of the high-dimensional input spaces, confounding, little external prior information and scarce data, the brute-force Bayes approach might not be feasible, especially for somewhat ‘automatic’ use, and suitable approximations needs to be investigated. For instance, some of the parameters (especially those involved in the emulator) might have to be replaced by estimates (or by fixed values). Also, poor modeling might result in undesirable outputs which can be alleviated by use of Bayesian ’modular’ approaches. The effects of these approximations however, have to still be investigated.

Hands-on-models will include both, ‘small’, and/or fast simulators (amenable to comparison, and thorough exploration of methodological issues), as well as large/expensive simulators which will no only foster efficient implementation, but will also reveal new areas of methodological interest.

1.3.9.2 Specific Activities

• Design Visitor Henry Wynn (Statist., London School of Economics, UK) presented pioneering work developing algebraic methods to support design, and new methods for the rapid identification of feasible regions in design space. Visitor David Steinberg (Statist. & Oper. Res., Tel Aviv University, Israel), an expert on design, presented a group of lectures on new ways of generating promising designs for computer experiments. Elaine Spiller (SAMSI post-doc) presented a reanalysis of simulated avalanche and volcanic flow data and designs from the work of Bruce Pitman (Mathematics, University of Buffalo) and colleagues, illustrating how sharp discontinuities in the parameter space may warrant special attention in the design phase.

• Emulators Participant Herbie Lee (Applied Math. and Statist., UC-Santa Cruz) presented his work on “treed Gaussian processes” intended to overcome the isotropy limitation of GASPs. Jim Crooks (SAMSI/Duke post-doc) presented motivating problems from the SAMSI Terrestrial working group, where GASPs (and other Gaussian methods) seem inappropriate. Robert Wolpert (ISDS, Duke Univ.) presented novel non-Gaussian methods that may offer an alternative to GASPs.

• Huge spaces Bruce Pitman (Mathematics, Univ. Buffalo) presented complex PDE models for avalanche modeling, along with early efforts to develop simulators (based

127

Several subject-area specialists presented their high-dimensional models to help the Working group appreciate the breadth of these problems and look for common themes – these include SAMSI visitor Peter Reichert (Environ. Sci, ETH, Switzerland) speaking on hydrology problems (including the need for time-dependent parameters, with the potential of increasing model dimensions dramatically); Ken Reckhow (Environ. Sci, Duke Univ.) speaking on water quality models; Bernt Mueller (Physics, Duke Univ) speaking on models for high-energy nuclear collisions; Jim Clark (Environ. Sci and Biology, Duke Univ) speaking on dynamic terrestrial models. Doug Nychka (head of IMAGe at NCAR) arranged for remote presentations on several aspects of climate modeling by his colleagues at NCAR (Johannes Feddema of Univ Kansas on anthropogenic land cover change experiments; Pablo Mininni of NCAR on statistical properties of turbulent flows; Hanli Liu, Art Richmond and Michael Wiltberger of NCAR, on upper atmosphere modeling issues; and Joshua Hacker and Gordon Bonan of NCAR on planetary boundary layer uncertainties).

• Bayesian approaches to Calibration, Validation and Prediction Michael Goldstein (Math. Sci, Univ Durham, UK) presented several lectures on the Bayes Linear approach to statistical modeling, and the Working Group sought similarities and contrasts with the GASP model emulator approach to computer model validation and assessment. Tony O’Hagan (Statistics, Univ. Sheffield, UK) offered a series of talks exploring the approach pioneered in Kennedy and O’Hagan (2001).

• Prediction for untried (or altered) scenarios. The work by Bruce Pitman and Abani Patra (Mech. & Aero. Eng., Univ. Buffalo) and its reanalysis by Elaine Spiller (SAMSI post-doc) helped illuminate the obstacles one faces in interpolation and extrapolation of emulators and simulators, in a problem with sharp discontinuities. New methods are needed to raise cautionary ‘red flags’ when such discontinuities are recognized during emulation and simulation.

Wolpert and Simon Lunagomez (ISDS, Duke Univ.) presented early versions of models and methods for using volcanic eruption data from Pitman, Patra, and Eliza Calder (Geology, Univ Buffalo) to make long-term predictions of the rate of cataclysmic eruptions based on mid-term data of a wide range of eruption magnitudes.

• Multivariate outputs Lenny Smith (Maths, Oxford Univ, UK) presented climate models with coupled outputs, illuminating a range of ways that models may be

128 uncertain in their parametric aspects or predictions, and showing how multidimensional methods may be needed.

• Approximations. Cari Kaufman (SAMSI/NCAR postdoc) presented a dimension- reduction technique called ‘covariance tapering’ in which model Gaussian covariance functions are approximated by much sparser ones, leading to computationally- tractable approximations for high-dimensional problems. A number of possibilities for extension of this work emerged from discussions within the Working Group.

• Use of derivative data for model calibration and prediction. Lately, new tools and software is being developed so that along with the simulator data, derivative data is also produced. How to best use this data is an important issue. The most common use of these derivative data is for design. We are developing a new approach which employs it for alternative bias specification, calibration and prediction. Indeed, usual emulators are very good for interpolating but not for extrapolating. Presumably, getting the bias function inside the emulator would borrow a bit of the physics and hence improve extrapolation.Unfortunately, this is unfeasible for complex simulators. However, a linearization of the problem along with derivative data provides much better emulators along with a more structured bias function, in part driven by the simulator. This approach is actively pursued by S. Bayarri, J. Berger and R. Paulo, who gave a presentation to the working group. Although there is hope that this will greatly improve extrapolation, new and challenging numerical implementation issues arise.

1.3.10 Statistical Mechanics of Granular Flow

The main theme of this working group is to investigate whether it is possible to deduce continuum behavior from statistical mechanics analysis of grain interaction. The distinguishing feature of granular flow by comparison to normal fluid mechanics is that the control volume over which we seek a continuum behavior is a mesoscopic system which contains a much smaller number of particles (103) than those that typically found in fluid mechanics (1023). Such mesoscopic systems are difficult to treat using standard methods of statistical mechanics. However, changing the focus to the statistical mechanics of the possible intergranular interactions allows the investigation of a system with a large number of components (1020). To this end a number of research goals which attempt to elucidate the intergranular interaction have been formulated.

1.3.10.1 Introduction. Research Goals.

The following themes are actively being pursued by the group :

• Numerical simulations of few grain interactions. All current particle interaction algorithms use a simplified grain interaction model based upon an analytical treatment of the collision of two isolated grains. The idealized situation treatable by analytical methods is known not to hold in dense granular flows. To supplement the isolated two-body interaction, numerical simulations have been carried out of

129 interactions between multiple grains (from 3 to 10 grains in the system) with full explicit solution of the elasticity equations. The objective is to allow direct numerical simulation to inform models on granular physics especially as regards segregation and fluctuations.

• Segregation and fluctuations. A new viewpoint on segregation and fluctuation starting from analysis of possible integrated interactions is under investigation. The attractive feature of this approach is the applicability of the standard tools of statistical mechanics. The relevance of the approach to the problem of deducing continuum granular flow behavior is not yet decided at this point.

1.3.10.2 Specific Activities

A number of meetings were held early on to allow communication between experimental researchers and theorists. The main result of this communication was a better appreciation of the difficulties inherent in describing the detailed physics of the grain interaction process. In particular, the best characterized experimental systems are formed by plastic disks that polarize light passing through them and thus allow visualization of the internal stress field (Behringer). The disks exhibit viscoplastic behavior. This has led to the problem of developing a simulation capability for such materials (Mitran) in order to first investigate two-body behavior in detail. As of March, 2007 the basic two-body problem has been solved. The next stage is to build a sufficiently extensive database of simulations that can then be compared to experimental results and afterwards inform theoretical treatments of segregation and fluctuation.

1.3.11 Systems Biology

Systems Biology is an exciting new area of scientific research, made possible only in recent years due to parallel developments in computational and experimental disciplines. In particular, advances in molecular biology technology has enabled study of cellular function and dynamics at levels of detail previously impossible. This has given new insight into biochemical network dynamics, and opens up the possibility of developing computational models of key aspects of cellular function. Such models have huge potential for furthering our understanding of biological processes; among other things, functional foods, rational drug design, personalized medicine and combination therapies are all targets for systems biology approaches.

1.3.11.1 Introduction. Research Goals.

There are a number of challenges in the way of building useful dynamic models of biological processes. Some are experimental, some are computational, but many are mathematical and/or statistical, and it is these mathematical and statistical challenges in particular that are be the focus of this WG. Particular issues include choosing an appropriate kind of model -spatial/non-spatial, deterministic/stochastic, discrete/continuous in time/space. Then given a particular choice, how can we choose and validate the processes being modelled, and the form of rate equations to be used? Given

130 these, how can we use very indirect and noisy experimental measurements to estimate key model parameters? How do we reconcile top-down descriptive statistical models fitted to available data with bottom-up mathematical models of the underlying mechanisms? What implications do new experimental technologies have for model development, choice, and assessment? What are the key mathematical and statistical developments required to advance the field of systems biology most rapidly? The workshop (described below) identified effective parameter estimation and associated identifiability and model validation issues as being a major obstacle for efficient model-building. This therefore represents the primary initial focus of the WG. Multi-scale modelling and simulation was also identified as requiring urgent development effort, and this is to be the subject of related activity.

1.3.11.2 Specific Activities

The WG formed out of the Biosystems Modeling workshop (detailed below), and meets weekly at SAMSI, involving SAMSI visitors and locals, as well as other participants via teleconference/internet. The WG is initially focusing on parameter estimation problems encountered by two of the participants. The first is a deterministic biochemical network model arising from work being carried out at the Environmental Protection Agency. Some simple parameter estimation methods have already been applied to the model (and associated data). The working group is now examining the application of more sophisticated Bayesian analysis techniques to the problem in order to better understand the parameterization issues. The WG is currently benefiting from the visit to SAMSI of Richard Boys and Daniel Henderson (Newcastle U.), experts in estimation and calibration of both deterministic and stochastic biological models. The second model is a stochastic model of molecular motor dynamics being developed at Cornell. Parameter estimation for complex stochastic kinetic models is still a cutting-edge research problem, and this will therefore present a more formidable challenge to the working group. This model will be the subject of an intensive research session in April, when Richard Yamada (Cornell) visits SAMSI.

1.3.12 Terrestrial Models

The Terrestrial Models subgroup of the “Development, Assessment and Utilization of Complex Computer Models” program is aimed at understanding vegetation response to climate change, incorporating processes that operate at fine spatial scales. Predictions of biodiversity response primarily emphasize climate envelopes, translating the climate range where a species is found today to maps of 2 x CO2 predictions of future climate. Yet real populations are controlled by processes that operate at spatial scales of meters, being limited by dispersal and soil variation (including hydrology). These fine-scale processes depend on climate, but their impacts depend on heterogeneity. The challenge of understanding climate impacts entails i) downscaling regional climate to realistic variation at relevant scales and ii) parameterizing that variation appropriately for forest response.

131

1.3.12.1 Introduction. Research Goals.

This research is intended to develop models that integrate regional climate change with landscape heterogeneity that can be used to explore biodiversity response. We specifically consider intermediate scale variation in climate, between fast weather (minutes to days) and slow climate (years), with biodiversity response. Although this is arguably the most logical scale to consider, as it describes the periodic droughts, warm springs, etc, to which plants respond, all modeling to date has been on year-to-year variation. Moreover, current efforts focus on a “climate envelope” approach, sometimes coupled with species area curves. This approach is problematic for a number of reasons discussed in Ibanez et al. (Ecology 2006). The landscape heterogeneity is being addressed with models that downscale regional climate to moisture and temperature gradients related to topography. Identified research goals are:

• Overall model. A hierarchical inference/forward simulation approach seems to be a useful vehicle for this analysis and can engage a wide range of interests and activities, merging an existing complex inferential model and a forward simulator that uses the estimates from the inferential model.

• Biodiversity predictions for climate change scenarios is a main goal: it is possible and would have a wide audience and impact.

• Emulator. Detailed analysis of the simulation output and construction of the emulator (Gaussian process model or others) based on simulation of detailed models. Can we fill in the parameter space? What are the sensitivities? Which parameters, in what states? Identify data needed to improve prediction.

• Different amounts and types of data available from different investigators. Many investigators have different types of information that bear on essentially the same sets of processes. What types can be included in a general scheme? How to best scale inference and simulation to large regions? How should high frequency forcing response be scaled to demographic scales (individual allocation over years)? How to deal with variables that are completely hidden? Investigate connections to regional hydrology.

1.3.12.2 Specific Activities

During this period, we have worked toward the goal of biodiversity predictions for climate change scenarios. Based on work to date, we expect such predictions to look much different from those currently in the literature. The elements of this analysis may ultimately include compilation of seasonal climate predictions, from our participants affiliated with NCAR, the parameterized model to seasonal climate, which is in

132 development here at Duke, and construction of an emulator. There are three modeling interrelated modeling efforts, summarized here:

Downscaling regional climate. We are building a statistical model to predict soil moisture over space and time under varying climate scenarios, using data from sites in North Carolina. The model can be cast in a state-space framework, with the spatial field of soil moisture constituting the unobserved state we wish to predict. The model for soil moisture evolves dynamically in time, taking into account precipitation, evapotranspiration, and runoff, while the observations generated by this underlying state are TDR measurements of soil moisture, irregularly sited in space and time. The model also includes unknown parameters controlling, for instance, the effects of climatological and topographical variables on the processes of evapotranspiration and runoff. We are implementing a particle filter algorithm to fit this model. This will allow us to sample from the posterior predictive distribution for soil moisture under varying climate scenarios, and these samples can then be used as input to the Clark group’s forest simulator.

Biodiversity simulation. The hierarchical inference/forward simulation approach of Govindarajan et al. (2004, 2007) is being used for this analysis. For this project the model is being extended to include landscape variation in soil moisture and temperature variation. This work is being led by Sean McMahon.

Model emulation. Jim Crooks is working on a novel way to perform non-parametric estimation and interpolation of distribution functions. The idea is to run the simulator multiple times under each of several simple climate scenarios, analyze the distribution of output in each scenario, and furthermore to predict this distribution under other scenarios. The output under a single scenario can be analyzed using a Dirichlet distribution, but to interpolate requires a method to join such distributions together. He is researching a way to do so using Gaussian copulas which allow control over the covariance structure.

References • S. Govindarajan, M. Dietze, P. K. Agarwal and James Clark (2004). A scalable simulator for forest dynamics. In Proceedings of the twentieth annual symposium on Computational geometry, pp 106–115. ACM Press: New York, USA. • Govindarajan, S. M. Dietze, P. Agarwal, and J.S. Clark. (2007). A scalable algorithm for dispersing populations. Journal of Intelligent Information Systems, in press. • Iba˜nez, J. Clark, M.C. Dietze, K.Feeley, M. Hersh, S. Ladeau, A. McBride, N. Welch and M. Wolosin (2006). Predicting Biodiverisy Change: Outside the Climate Envelope, Beyond the Species-Area Curve. Ecology, 87(8), 2006, pp. 18961906

1.3.13 Inference and Uncertainty Analysis of Hydrological Models

Most deterministic environmental models lead to systematic deviations of model results from measured data if data time series at high resolution are available. This is mainly due

133 to errors in input and model structure. Nevertheless, in practical applications, most uncertainty analysis have been performed ignoring these uncertainty sources and assuming independently distributed errors. Uncertainty estimates derived in this way are unreliable, because they are based on statistical assumptions that are not fulfilled. Hydrological watershed models are examples representing these problems. They represent a particularly suitable class of models to study the methodological aspects of these problems because high resolution data is available and simple watershed models show essentially the same problems as more complex ones. This makes it possible to study the problems with small models efficiently and transfer the developed solution techniques for tests with larger models at a later stage of the development. The goals of this working group are:

1. To compare different approaches for analyzing the causes of systematic deviations of the results of deterministic watershed models from measured data; 2. To develop improved model structures for addressing the deterministic components of these error sources; 3. To develop statistical descriptions of stochastic error sources that are not considered in conventional approaches.

Options for addressing these issues include:

1. Formulation of an adequate error model for precipitation time series used as model input; 2. Formulation of stochastic model equations and use of state estimation to learn about corrections to be made to model states to reduce systematic errors; 3. Use of time-dependent model parameters followed by exploratory analyses of possible influence factors on these parameters to learn about structural model improvements.

The techniques developed in the working group should lead to error models which are not significantly violated and therefore lead to more reliable uncertainty estimates. The working group will start with reading some key papers and testing selected techniques with a simple hydrological model. Then, a workshop will be held with scientists working at the interface between hydrological modelling and statistics to discuss the preliminary results and future work. This workshop is not yet scheduled. Key references are:

• Kuczera, G., Kavetski, D., Franks, S., Thyer, M., Towards a Bayesian total error analysis of conceptual rainfall-runoff models: Characterizing model error using storm-dependent parameters, Journal of Hydrology, in press. • Vrugt, J.A., Diks, C.G.H., Gupta, H.V., Bouten, W., Verstraten, J.M., Improved treatment of uncertainty in hydrologic modeling: Combining the strengths of global optimization and data assimilation, Water Resources Research 41, W01017, doi:10.1029/2004WR003059, 2005.

134

1.4 Workshops

Kickoff Workshop was held September 10-14, 2006. Its principal goal were to engage a broadly representative segment of the statistical, applied mathematical and computer modeling communities in formulation and pursuit of specific research activities to be undertaken by the Program Working Groups, which include: i) Formulation of central research issues, ii) Identification of test bed computer models and data, iii) Formation of external partnerships between the Working Groups and others, especially Kickoff Workshop participants, interested in the Program. These goals were successfully attained, as can be seen in the rest of this report.

Joint Engineering and Methodology Subprograms Workshop was held on October 2627, 2006 at SAMSI. While the kickoff workshop intended to have the working groups started, this one attempted to focus their orientation, profile the main issues and interest, get the working group exposed to insights, methods and ideas from experts in the field. The emphasis was on discussions, interaction, problem solving and the detailed specification of problems of engineering and more general methodology. As part of the workshop, Anthony O’Hagan (U. Sheffield) delivered a SAMSI distinguished lecture.

Joint NCAR and SAMSI workshop. Geophysical Models at NCAR: A Scoping and Synthesis Workshop (Opening workshop for the climate modeling SAMSI group.) The SAMSI climate/weather modeling group jointly with the National Center for Atmospheric Research (NCAR) hosted a workshop at NCAR (Boulder) on November 13- 14. The objective of the workshop was to introduce researchers to the atmospheric numerical models that are being developed and used at NCAR. It was intended as a scoping and brainstorming meeting where four NCAR modeling groups interacted with a large group of SAMSI statisticians and applied mathematicians interested in the design and analysis of computer experiments. The NCAR geophysical models/groups were:

• Upper atmosphere model (TIEGCM) (HAO: Maura Hagan, Ray Roble, Art Richmond)

• Single column boundary layer model (RAL/MMM: Josh Hacker)

• Two dimensional turbulence in Navier Stokes flows (IMAGe: Annick Pouquet)

• Land component of the NCAR climate model (CGD: Gordon Bonan; U Kansas: Johannes Feddema)

The intent was that concrete problems were identified that helped structure the statistical climate modeling working group activity at SAMSI. For each modeling group a statistical researcher serves as a liaison to guide collaboration among the modeling group and the statistical working groups.

135 Mathematical and statistical challenges in systems biology modeling March 5–7, 2007. The workshop gathered many of the world’s leading experts in systems biology modeling to present their view of the key challenges and potential strategies for their solution. It was a stimulating 3 days, with many highlights, including a SAMSI Distinguished Lecture by Daniel Gillespie on the importance of stochastic effects in the modelling of genetic and biochemical networks. There was an excellent mix of disciplines present, including computational biologists, physicists, applied mathematicians, statisticians and probabilists. The two key problems highlighted by the workshop were parameter estimation issues (especially in the context of stochastic models), and multi-scale modelling of stochastic kinetic biochemical systems. The first problem (parameter estimation) is currently being tackled by the working group. The feasibility of establishing a SAMSI working group to tackle the second problem (multi-scale modelling) is currently being considered. Many of the workshop participants (including Greg Rempala, Daniel Gillespie and Linda Petzold) are keen to start a WG in this area, but there are currently insufficiently many people resident at SAMSI to be able to get it off the ground. Irrespective of the formal formation of a WG for this problem, new collaborations in this area are very likely to arise following the workshop.

Theoretical and experimental developments in granular physics. The Statistical Me- chanics of Granular Flow working group is planning a spring working meeting on theoretical and experimental developments in granular physics.

Joint SAMSI/MUCM Mid-Program Workshop, held April 2–3, 2007.This joint workshop brought together methodologists from the SAMSI program on the development, assessment and utilization of complex computer models with those in the UK Research Councils-funded MUCM (Managing Uncertainty in Complex Models) project, offering opportunities for cross-fertilization of these large-scale parallel efforts. A main goal was to report on the progress of ongoing projects as well as to coordinate efforts in research that has been stimulated by the close collaboration between the MUCM and SAMSI research teams. A stellar collection of speakers and participants agreed to participate, offering a unique opportunity that brought together an international community of scholars studying related problems. The format featured a series of talks, each followed by extended discussion of the issues highlighted by that, and previous, presentations. Selected outside experts in the field were invited to attend to provide criticism and direction to those who will be presenting research discussions. The workshop was accessible via WebEx and Teleconference for interested remote participants.

Satellite Meeting. Terrestrial Models working group had an intensive one-day satel- lite meeting on April 4, 2007. This was intended to provide an update of the progress within the working group and to elicit input from and generate collaborations with a small group of interdisciplinary scientists with expertise in one or more of the relevant fields (ecology, climatology, statistics, computer science). Short presentations and discussions were planned on the work that has been done to date, feedback/input from all,

136 and plans for potential new research to extend this effort. Invitees to the workshop included: Pankaj Agarwal, Duke; Jim Clark, Duke; Benoit Courbaud, CEMAGREF; Jim Crooks, SAMSI; Mike Dietze, Harvard; Ken Feeley, Wake Forest; Alan Gelfand, Duke; Cari Kaufman, NCAR, SAMSI; Sean McMahon, Duke; Paul Moorcroft, Harvard; Miles Silman, Wake Forest; Steve Sain, NCAR; Maria Uriarta, Columbia; Wei Wu, Duke. Duke Grads: Dave Bell and Carl Salk.

Joint SAMSI-NCAR Workshop. Application of Random Matrices Theory and Methods . Will be held at NCAR, on 7-10 May 2007. This workshop serves as the transition workshop for the RM SAMSI climate modeling group. The structure is more traditional but will include a blend of tutorial and research talks. Ample time will be reserved for discussion.

Transition Workshop Will be held on May 14–16, 2007. In this workshop the working groups will expose their main achievements to the statistical sciences, applied mathematics, and natural sciences communities. They will also formulate follow-up plans for this SAMSI program to continue joint research and education in this interdisciplinary area. Ample discussion and interaction has been planned. Also, influential external speakers who can provide guidance as to future developments and collaboration will be invited to give talks and/or participate.

Transition/Satellite Meeting. The Calibration of Computational Models of Cerebral Blood Flow working group will follow the previous workshop with an intensive satellite meeting on blood flow modeling. The outside speakers will be Jordi Alastruey Arimon, Imperial College (modeling, computation), Suncica Canic, Dept. of Mathematics, U. of Houston (modeling and analysis), Vera Novak, Beth Israel Medical Center, Harvard (experiments), Brooke Steele, Biomedical Engineering, NCSU/UNC (modeling), Charles Taylor, Dept. of Surgery and Dept. of Mechanical Engineering, Stanford U. (modeling, calculations, experiments). We expect this workshop to:

1. Consolidate some informal collaboration between the various groups,

2. Identify remaining key issues in a “holistic” (modeling, computation, experiments) approach to the problem,

3. Give some exposure to the junior members of our group (K. DeVault, graduate student and G. Vernieres, SAMSI postdoc),

4. Foster new collaborations between MDs, engineers, mathematicians and statisticians.

Joint SAMSI-NCAR Workshop. Application of Statistics to Numerical Models: New Methods and Case Studies. This will be held at NCAR on 21-24 May 2007. This workshop serves as the transition workshop for the climate modeling group of the Computer Models SAMSI program. As such, one of the objectives of this workshop is to

137 discuss and present results of the research conducted by the climate/weather working group. The structure will be that of a traditional conference but will include a blend of tutorial and research talks. Also, ample time will be reserved for discussion and for presentations on progress on the specific modeling project initiated in the first workshop.

1.5 Other Activities

1.5.1 Research Sessions; workshops

Calibration of computational models of cerebral blood flows. Dr. Novak and Dr. Zhao (Beth Israel Deaconess Medical Center, Harvard University) visited the working group at SAMSI in Fall 06 to discuss various issues related to data acquisition, measurement procedures and interfacing with the numerics.

System Biology. March Intensive Research Sessions involving a select group of contributors from the US and abroad. The main theme is on parameter inference for deterministic biochemical network models. This is focused on a metabolic network model in fish, of interest to the Environmental Protection Agency. A goal is to have the working group focused quickly in its future activities.

Granular Materials -Engineering Applications working group is planning an all-hands intensive working sessions during the Spring.

System Biology. April Intensive Research Sessions. In April there will be an intensive research session on parameter inference for stochastic kinetic models of molecular motor dynamics, based on models and data originating from Cornell.

Statistical Mechanics of Granular Flow. Follow-up activities. The initial effort on en- suring realistic simulation capabilities occupied the time of this working group researchers during 2006 and early 2007. It is tentatively planned to resume a discussion between experimental, computational and theoretical scientists in Summer 2007 on how to use these results.

Terrestrial Models .-The April Workshop of this working group will be integrated with the wireless sensor Program in 2007/08, providing important background on modeling.

1.5.2 Research projects

Several Research Projects have been also submitted as a direct consequence of the work and collaboration during the Program:

1. M. Fuentes, Shane Reese, Astrid Maute, Mike Wiltberger, and Art Richmond. “Models, Tools and Analysis for Studies of the Upper Atmosphere and the Magnetosphere ”. CMG Collaborative Research.

138 2. S. Guillas. College of Sciences Faculty Research Development Grant-Georgia Tech Foundation, Statistical calibration of RAQAST (2006). Funded, $10,434.00

3. S. Guillas. NASA-Atmospheric Composition Modeling and Analysis: High Resolution Assessment of Stratosphere-Troposphere Exchange and its Impact Upon the Background Tropospheric Ozone Concentrations. Program: Atmospheric Composition Modeling and Analysis, co-PI, pending, $55,849.00

4. S. Guillas. NSF-DMS: CMG–Statistical Improvements of Three-Dimension (3-D) Model Air Quality Forecasts. PI, pending, $704,562.00

5. E. B. Pitman, and U. Buffalo geologist M.F. Sheridan, with support from M.J. Bayarri, J. Berger and L. Pericchi, submitted a proposal to use investigate debris flows at Tungurahu in Equador and Ruapehu Volcano in New Zealand.

6. E.B. Pitman, A. Patra and U. Buffalo geologist M. Bursik have obtained funding to develop advanced simulation methodology for mass flows.

7. Robin Tokmakian (PI, Naval Postgraduate School), James Gattiker, Peter Challenor, “Uncertainties within a Climate Model’s Ocean Component”, submitted to US Department of Energy Office of Biological and Environmental Research. Jan 2007.

1.6. Education and Outreach

1.6.1 Credit Courses

The Program has offered two courses, one in the Fall and one in the Spring semester, which are credited 3 credits/units at each of the participating Universities.

Granular Flow Course to help educate the next generation of physicists, engineers, mathematicians and statisticians, a course (3 credits/units) on the flow of granular materials was offered during the Fall 2006 semester. Seven students (4 women, 3 men) and 3 post-doctoral fellows (1 woman, 2 men) participated in the course. The instructor was Bruce Pitman (U. Buffalo).

Course on Environmental Modelling The Subprogram on Ecological, Environmental and Climate models is offering a course during Spring 2007, which includes not only detailed study of these models, but also statistical inference and uncertainty analysis for them. It is held at SAMSI, on Wednesdays 4:30 to 7pm. Instructors: M. Fuentes, J. Clark, Peter Reichert, G. Hegerl.

139 1.6.2 Workshops

The following workshops are mainly oriented to graduate students, post-docs and new researchers, with a primarily educational goal, to make these new researchers familiar with the models, problems and issues in the development, design and utilization of Computer Models.

Summer School on the Design and Analysis of Computer Experiments August 11-16, 2006 at IRMACS, Simon Fraser University. This Summer school was conducted jointly between SAMSI and the Canadian National Program on Complex Data Structures. It was and excellent opportunity for students, new researchers, and others interested in becoming involved with the study of computer models to learn many of the latest methodological developments in the area. It consisted on a two-days Short Course, two- days Software Symposium, and two days Hands-on Problem Solving. An Undergraduate Workshop was held at SAMSI on March 2-3, 2007. Approximately 25 undergraduate students from 9 undergraduate majors and 15 colleges and universities were shown how computer models are used in the engineering design of consumer products, the description of watershed activity, the prediction of climate and weather, and the temporal evolution of terresterial land cover. A computer laboratory exercise was used to illustrate aspects of the complexities of using such models.

SAMSI Two-Day Undergraduate Workshop March 2-3, 2007 at SAMSI. Approximately 30 undergraduate students from undergraduate majors, colleges, and universities were shown how computer models are used in the engineering design of consumer products, the description of watershed activity, the prediction of climate and weather, and the temporal evolution of terresterial land cover. The workshop presented the idea of mathematical models, their numerical computer implementation, and their statistical utilization, in a wide variety of scenarios and at a level adequate for the wide range of students present. Presentations were given by both new and senior researchers, and hand-on computer practices helped students grasp the basics of computer simulators, understand the role of the inputs and input sensitivity, as well as served to illustrate aspects of the complexities of using such models. Big emphasis was given to discussion, which was alive and insightful. The workshop was very well attended (the pre- registrations amply exceeding the quota for this workshop) with students from all over USA. The Workshop accomplished the goals of exposing and interesting a wide diversity of bright students to the area of Complex Computer Models, their development, assessment and utilization.

Summer Graduate Workshop on Data Assimilation for the Carbon Cycle. To be held at NCAR on July, 8-13, 2007. This summer school will expose students in the geosciences, ecology, and mathematics to multidisciplinary science through a focus on estimating the sources and sinks of carbon for the Earth system. One goal is to train the next generation of researchers to work within a multidisciplinary science team that combines geoscientists, ecologists, applied mathematicians, and statisticians. Participants will obtain an overview of this problem but also some specific skills in tackling inverse problems and working with geophysical and biogeochemical models.

140 1.6.3 Presentations at Professional Meetings

Apart from the meetings, workshops, presentations at working groups, etc., directly organized by the Program, the following presentations at Professional Meetings have been held or are planned for the immediate future:

• J. Gattiker, “Calibrating Climate Models: a case study and comparison of DACE approaches”, Environmental Statistics Section of the Royal Statistical Society meeting in Southampton, 12 Jan 2007.

• M.J. Bayarri, J.O. Berger and G. Molina. “Incorporating Uncertainties into traffic Simulators”, Invited Talk. Cost Action 285 Final Symposium on Recent Advances for Modeling and Simulation Tools for Communications Networks and Services. University of Surrey, United Kingdom, 28-29 March 2007.

• M.J. Bayarri, J.O. Berger, F. Liu, R. Paulo and J. Sacks. Invited talk: Validation of complex computer models with functional outputs. IMS Spring Research Conference, Iowa State, Ames, IA (USA), May 21-23 2007

• Jim Berger. “Analysis of Complex Computer Models of Processes”. Invited talk. BISP5, Fifth Workshop on Bayesian Inference in Stochastic Processes. Valencia (Spain), June 14-16, 2007

• Richard Boys. “Bayesian inference for stochastic epidemic models with time- inhomogeneous removal rates”. Invited talk. BISP5, Fifth Workshop on Bayesian Inference in Stochastic Processes. Valencia (Spain), June 14-16, 2007

• M.J. Bayarri, J.O. Berger, F. Liu, and R. Paulo. “Modeling issues when combining field and computer model data for prediction”. Invited talk. International Workshop on Statistical Modelling 2007. Barcelona, July 2–6, 2007.

BIRS Darren Wilkinson and other participants of the system Biology working group will be involved in the BIRS workshop on “Bioinformatics, Genetics and Stochastic Computation: Bridging the Gap” at the Banff International Research Station, Canada. July 2–6, 2007. http://www.pims.math.ca/birs/birspages.php?task=displayevent\&event_id=07w5079 and this will therefore provide a natural venue for continuing face-to-face discussion among the community that is developed, as well as to disseminating work and results from the SAMSI Program.

JSM07 There will be 3 Topic Contributed Sessions organized under the SAMSI umbrella for the 2007 Joint Statistical Meetings under the Complex Computer Models Program:

141 1. Methodological Issues in Engineering Applications of Computer Models: A SAMSI Program. Organized by Tom Santner. Sponsored by the Section on Bayesian Statistics (SBSS) of ASA. Speakers are Herbert Lee, Fei Liu, Gentry White, Dianne Bautista and Laura Swiler

2. The role of statistics in ecological and climate modeling. Organized by M. Fuentes. Sponsored by the climate and weather modelling SAMSI working group and the Environmental Statistics (ENVR) Section of the ASA.

3. Using Computer Models for Ecological, Environmental, and Biological Applications: A SAMSI Program. Organized by M.J. Bayarri. Sponsored by the Section on Bayesian Statistics (SBSS) of ASA. Speakers are Leonard Smith, Peter Reichert, Ariel Cintron-Arias, Bruce Pitman, Serge Guillas.

1.6.4 Seminars

A number of presentations at Department seminars have been given or are planned for the near future: • Serge Guillas. Calibration of a 3-D air quality model. U. of Chicago, CISES seminar, February 2007.

• Doug Nychka (NCAR).“The ensemble Kalman filter: The movie”. NCSU Seminar and special presentation for the climate modeling group. March 15, 2007.

• Darren Wilkinson. Biostatistics seminar at the NIEHS in RTP.

• Ma Chunsheng. Space-time downscaling of regional ozone forecasts with nonseparable covariance models. May 2007.

1.6.5 Graduate students

The Program is contributing in the achievements, education, and Ph.D. projects of many undergraduate students, both locals and non locals:

Dianne Bautista visited SAMSI from September 1, 2006 until December 15, 2006 to participate in the Program on Development, Assessment and Utilization of Complex Computer Models. She is a graduate student in the Department of Statistics at The Ohio State University working on her PhD thesis. She participated in several working groups, mainly in the Engineering Methodology and Methodolgy working groups. During her visit, Dianne Bautista worked on two projects related to her thesis. The first of these studied a non-parametric method of estimating (a valid) correlation function as part of the process of predicting the output of a computer code. The second project was to develop a method for

142 sequentially designing a computer experiment with multivariate output to find the Pareto Frontier of the set of codes. Both projects are continuing upon her return to Ohio State.

Miyuki Breen (NCSU) Estimating biological models. Involved in the System Biology working group.

Kristen Dang (UNC) Dynamic modelling of virus life cycle. Involved in the System Biology working group.

Kristen DeVault. (NCSU graduate student) is a key member of the Calibration of computational models of cerebral blood flows working group and is expected to graduate in Spring 08. She will visit Dr. Novak’s group at Harvard in Summer 07.

Gang Han visited SAMSI from September 1, 2006 until December 15, 2006 to participate in the Program on Development, Assessment and Utilization of Complex Computer Models. He is a graduate student in the Department of Statistics at The Ohio State University working on his PhD thesis. He participated in several working groups, mainly in the Engineering Methodology and Methodology working groups. Gang Han completed work on developing methodology for computer experiment output that has inputs which are either, by their nature, nominal-valued or should be treated as such. These types of inputs occur in biomechanics applications where mesh density or the level of discretization of functional input occur. He has developed software that allows the fitting of these mixed qualitative-quantitative models. Gang also started work on producing a methodology (and MATLAB software) for simultaneously determining calibration and tuning-parameter inputs to computer codes. This problem occurs in settings which both computer-and physical experiments have been conducted and the computer code has some inputs which are calibration inputs whose values in the physical experiment are unknown to the experimenter and, additionally, there are (numerical) tuning-parameters, present only in the computer code, that can be used to force the computer code output nearer the physical experiment output.

Fei Liu (Duke). Fei Liu actively participates in a number of working groups: Air Quality, where she gave a presentation: “Calibration for spatial and spatio-temporal model outputs”; Engineering Methodology, where she gave two presentations: “Discussion of a Thermal Model”, and “Dynamic Linear Models as emulators”; Terrestrial models, and also in the Methodology working group, where she is the web-master. She will be presenting at the Transition Workshop in May, and also at the JSM07 SAMSI Topic Contributed Sessions in July.

Simon Lunagomez of Duke University ISDS is working under the guidance of Robert Wolpert (also of Duke ISDS) in developing hierarchical Bayesian models for pyroclastic flows, intended to help predict the frequencies of large volcanic eruptions over decade-or century-long periods. The models have underlying Pareto components for large individual flow eruptions and α-stable components for the aggregation of many smaller flows, all tailored to the kinds of data emerging from the Montserrat Volcano Observatory (MVO), established in 1995 by the British Geological Survey and the

143 University of the West Indies’ Seismic Research Unit to study the ongoing volcanic activity of the Soufri`ere Hills volcano. SAMSI participant Bruce Pitman (Math, Univ. Buffalo) and his colleague Eliza Calder (Geology, Univ. Buffalo, and MVO alumna) have secured access to the MVO data for Lunagomez and have assisted in its interpretation. A range of modeling, numerical, and methodological issues have arising already in this work, giving Lunagomez a remarkable educational opportunity. This work will be part of his Ph.D. Thesis. He is also in the early stages of building stochastic process-based models for Gamma Ray Bursts (GRBs), with the help of Cornell University astronomer and SAMSI participant Tom Loredo. Simon Lunagomez gave a presentation to the Methodology working group on March 12, 2007.

Jarad Niemi (Duke) Parameter estimation for stochastic biochemical network models. Involved in the System Biology working group.

Justin Shows Graduate student at NCSU, working under Dr. Fuentes’ supervision. Mr. Shows has passed the written and oral Ph.D. exams. Current mesoscale numerical weather prediction (NWP) models use complex, multi-layer, soil and canopy models to specify time-dependent lower boundary conditions for atmospheric solutions. Parameters, with values typically constrained by empiricism or heuristic physical arguments, are ubiquitous in these models. The literature shows that atmospheric solutions can be sensitive to choices of a few of them. In practice, many of these parameters can be viewed as tuning knobs, and subjective tuning is acknowledged practice in NWP implementations. Many parameters are not necessarily constant, and may very slowly in time. Complex soil models attempt to account for a wide range of physical processes. Because these models provide lower boundary conditions for atmospheric models, and the metrics for success and utility are usually based in the atmospheric component, an argument can be made that simpler models should be constructed and objectively tuned. Simpler models usually have fewer parameters that control the atmospheric response, and their functional relationship to the atmosphere is usually more accessible. The goal of Mr. Shows’s project in collaboration with NCAR and under the supervision of Dr. Fuentes (NCSU) is to design and explore optimal methodologies for finding distributions of parameters. Experience shows that ensemble data assimilation is a useful paradigm to approach this problem, where covariance between prior distributions in observation space, and a parameter distributions, are readily available. The 1D model described above also enables efficient research on this topic. Some references for ensemble data assimilation within this modeling framework are Hacker et al. (2007) and Hacker and Rostkier-Edelstein (2007). Mr. Shows will be presenting his research at the May 21-24 SAMSI/NCAR workshop.

Richard Yamada (Cornell) Stochastic kinetic modelling of molecular motor dynamics. Involved in the System Biology working group.

144 2. High Dimensional Inference and Random Matrices

2.1 Program and its Objectives:

Random matrix theory lies at the confluence of several areas of mathematics, especially number theory, combinatorics, dynamical systems, diffusion processes, probability and statistics. At the same time, random matrix theory may hold the key to solving critical problems for a broad range of complex systems from biophysics to quantum chaos to signals and communication theory to machine learning to finance to geophysical modeling. This Program was a unique opportunity to explore the interplay of stochastic and mathematical aspects of random matrix theory and its applications. The aim of the program was to bring together researchers interested in the theory and applications of random matrices to share their results, discuss new research directions and develop collaborations. The program concentrated on large-dimensional random matrices and the problems that make use of them. In particular, emphasis was on how developments in random matrix theory might impact statistical inference in high dimensional systems. The program has had two parts. In the fall, there was an emphasis on statistical inference and the theory of random matrices. In the spring semester, the focus shifted to issues arising in connection with geometry and random matrices.

2.2 Core Group

A core group of researchers gathered at SAMSI during the Fall Semester, 2006 and this group was complemented by a group on the West Coast that gathered every Friday and held a video-conference with SAMSI. This link facilitated significant exchange between the two groups. In addition, various people joined the program through Webex connections to the working groups.

2.2.1 Senior researchers (at SAMSI for significant periods of time in Fall, 2006): • Jianqing Fan (Princeton University) • Eitan Greenshtein (Purdue University) • Thomas Guhr (Lund University) • Christian Houdre (Georgia Institute of Technology) • Helene Massam (York University) • Peter Miller (University of Michigan) • Greg Rempala (University of Louisville) • Don Richards (Penn State University) • Nan Wermuth (Chalmers U of Technology) • Ofer Zeitouni (University of Minnesota)

2.2.2 Senior Researchers (Spring, 2007) • Mikhail Belkin (Ohio State University)

145 • Yoonkyung Lee (Ohio State University) • Feng Liang (University of Illinois Urbana Champaign)

2.2.3 Key Berkeley node researchers • Noureddine El Karoui (University of California Berkeley) • Iain Johnstone (Stanford University) • Peter Bickel (University of California Berkeley) • Craig Tracy (University of California Davis) • Bin Yu (University of California Berkeley)

2.2.4 New researchers • Debhashis Paul (University of California Davis) • Makram Talih (Hunter College) • Ming Yuan (Georgia Institute of Technology)

2.2.5 Postdoctoral fellows • Manjunath Krishnapur • Jayanta Pal • Bala Rajaratnam • Cari Kaufman (Joint with Complex Computer Models Program) • Elain Spiller (Joint with Complex Computer Models Program)

2.2.6 Local faculty • Ilse Ipsen (North Carolina State University) • Yufeng Liu (University of North Carolina) • Sayan Mukherjee (Duke University) (Spring) • Jack Silverstein (North Carolina State University) • Len Stefanski (North Carolina State University) • Young Truong (University of North Carolina) • Stephanos Venakides (Duke University) • Li Lexin (North Carolina State University) (no teaching release) • Haipeng Shen (University of North Carolina) (Spring)

2.2.7 Graduate students • Sergei Belov (Duke University) • Hongyan Cao (University of North Carolina) • Zhenglei Gao (Duke University) • William Lefew (Duke University) • Trevia Litherland (Georgia Institute of Technology) • Jinchi Lv (Princeton University) • Xingye Qiao (University of North Carolina) • Teresa Selee (North Carolina State University) • Dhruv Sharma (North Carolina State University)

146 • Hua Xu (Georgia Institute of Technology) • Yingying Fan (Princeton University) • Yufan Zhao (University of North Carolina)

2.3. Program Organization

2.3.1 Opening workshop

The Opening Workshop for the SAMSI program on High Dimensional Inference and Random Matrices was held Sunday-Wednesday, September 17-20, 2006, at the Radisson Hotel RTP in Research Triangle Park, NC. It was preceded, on Sunday, September 17, with tutorials by Craig Tracy, and Ofer Zeitouni.

The goal of the opening workshop was to allow community input to formation of the working groups for the program, as well as promote engagement (via web, teleconference, and videoconference) of those who will not be resident at SAMSI during the program. The workshop program will focus heavily on open problems in high dimensional inference and random matrix theory for which solutions are not currently available.

The workshop was organized by Iain Johnstone, Peter Bickel, Hélène Massam, Douglas Nychka, and Craig Tracy.

The workshop included a number of distinguished speakers. Apart from the Distinguished Lecture of David Donoho (see below), featured were also overview lectures by Roland Speicher and Alan Edelman.

During the Tuesday evening, the program leaders committee met with some of the key participants who were to be present at SAMSI. A first cut at the working groups was made. After discussions the following day, a strong list of working groups was formed and the participants signed up for what interested them. There was an extraordinary response to the working group call, with almost all of the workshop participants staying around for the working group formation.

2.3.2 Bayesian Focus Week

The Bayesian Focus Week workshop: October 30th to November 3rd, 2006 at the Radisson Hotel RTP in Research Triangle Park, NC. The themes of the workshop were inference in high-dimensional graphical models, choice of priors for efficient model search in Gaussian and log-linear models, and algorithms for model selection. Problems of inference in Gaussian graphical models include high dimensional covariance estimation, estimation of eigenvalues, computation of moments, and Gaussian graphical models for time series. Theoretical as well as implementation properties of various priors will be considered for model selection.

147 The emphasis in the workshop was on discussions, interaction, problem solving and the identification of new problems in a cross-disciplinary setting, concentrating on high- dimensional problems and exploring the possible use and applications of large random matrices. The workshop was organized by Hélène Massam, Peter Bickel, and Mike West.

2.3.3 Large Graphical Models and Random Matrices Workshop

November 9-11, 2006 at the MCNC Auditorium and the Radisson Hotel RTP in Research Triangle Park, NC. The workshop focused on the following topics: transforming real- valued matrices to study induced associations in linear systems and transforming binary matrices to study the preservation of independencies in Markov graphs, matrix and path criteria for Markov equivalence for separation in Markov graphs and for identification of corresponding models under given distributional assumptions, the integration of clustering and of censoring into graphical models, issues of model fitting, model selection and model uncertainty for graphical models, and consequences for a given generating process of omitting variables and of conditioning on variables or on the propensity score The workshop was organized by Nanny Wermuth, Hélène Massam, and David Cox. 2.3.4 Workshops with NCAR A number of workshops with NCAR were planned. Some of these have been primarily organized through the Complex Computer Models Program but involve some HDRM participants (for instance, the Geophysical Models Workshop, Nov 13-14.) A workshop is planned for May, to be held at NCAR. This workshop will emphasize the problems in random matrix theory that arise in large geophysical models. The workshop will be entitled: Application of Random Matrices Theory and Methods, be held May 7-9 At NCAR, Boulder, Colorado and organized by Thomas Bengtsson, MontseFuentes, and Peter Bickel.

2.3.5 Geometry workshop

Geometry, Random matrices, and Statistical Inference was held January 16-19, 2007 at the NISS Building in Research Triangle Park, NC. The four day workshop kicked off the semester long focus on Geometry and Random Matrices. Both algorithms and the fundamental mathematical objects computed by the algorithms were stressed. This workshop is being followed by a semester long working group on "Geometry, Random Matrices, and Statistical Inference." The format of the workshop had two talks in the morning and one in the afternoon. All talks will be one hour and there was considerable time for discussion. The workshop was organized by Misha Belkin (The Ohio State University, Chair), Feng Liang (University of Illinois at Urbana-Champaign), and Sayan Mukherjee (Local Scientific Coordinator).

148 2.3.6 Transition workshop

The transition workshop was held at the American Institute of Mathematics in Palo Alto: April 10-13, 2007 as an ARCC workshop. Apart from the very pleasing bracketing of the program as having been both initiated and culminated at AIM, the nature of this transition workshop was ideally suited to the format of an ARCC workshop. Indeed, the leadership of the ARCC team in facilitating the group to collect its thoughts and focus their ideas into a blueprint for future research was beneficial and efficient for achieving the goals of the workshop. This workshop brought together a group of applied mathematicians active in random matrix theory with theoretical statisticians (and probabilists) concerned with high dimensional inference particularly via eigen-structure methods. It also engaged methodologically oriented researchers from application domains, such as climate prediction, in which large p data analysis has long played a major role.

The format of the meeting was that during the first day and a half, reports on the working group activities from the fall SAMSI program were delivered. Each of the four main reports was given by a duo of one senior and one junior researcher: 1. Bayesian Methods and Graphical Models: Helene Massam and Bala Rajranatham 2. Geometric Methods: Makram Talih and Armin Schwatzmann 3. Geometry and Statistical Inference: Sayan Mukherjee and Misha Belkin 4. Regularization and Covariance: Peter Bickel and Debashis Paul 5. Multivariate Distributions: Don Richards and Nourredine El Karoui. In addition, a talk on the results of the wireless communications collaboration was given by Jack Silverstein. The emphasis was on discussions about achievements of the working groups and future directions.

2.3.7 Distinguished Lectures

Two SAMSI Distinguished Lectures were held as part of the program. Each was part of a workshop.

David Donoho, Stanford University, Monday September 18, 2006 The Breakdown Point of Model Selection When There Are More Variables Than Observations

David Cox, Nuffield College, Oxford, Thursday November 9, 2006 Some Statistical Challenges Arising from an Issue in Veterinary Epidemiology

2.4. Activities

2.4.1 Working groups

The working groups met regularly throughout the program to pursue particular research topics identified in the kickoff workshop. • Climate and Weather

149 • Wireless Communications • Universality • Regularization and Covariance • Geometric Methods • Multivariate Distributions • Graphical Models/Bayesian Methods • Estimating functionals of a high dimensional sparse vector of means

2.4.2 Berkeley node

Each Friday during the fall semester, a video-conference link was established between SAMSI and UC Berkeley. This will feature talks, discussions and working group activities. It interfaced with both the Universality and the Regularization working groups.

2.4.3 Courses

2.4.3.1 Fall 2006 SAMSI course on Random Matrices

This course was an introduction to properties of eigenvalues of classes of random matrices fundamental to several areas of application, including multivariate statistics, high energy physics, numerical analysis, and wireless communications. Most of the results are expressed in terms of limit theorems, as the dimensions of the matrices increase, the significance of which enables the understanding of spectral behavior for random matrices of large dimension. The main results covered include the limiting behavior of empirical measure of the eigenvalues (law of large numbers and CLT's for linear statistics), extreme eigenvalues, distribution of largest eigenvalue, concentration inequalities and large deviation theory for the empirical measure and maximal eigenvalue.

The basic mathematical tools used to prove these results were introduced, among them being: moment methods, Stieltjes transforms, concentration inequalities and large deviations, and integrable systems. Lectures on applications to multivariate statistics were also included.

The course was team-taught by individuals who are renowned experts in random matrices.

Schedule of Instructors and topics covered: • September 7,14 Jack Silverstein and Bala Rajaratnam: Introduction to random matrices and the rmtools software. • September 21,28 October 5,12 Ofer Zeitouni: Method of moments for law of large numbers, CLT's and largest eigenvalue; concentration; GOE-GUE; large deviations; Fredholm determinants, determinantal point processes. • October 19,26 Jack Silverstein: Stieltjes transform methods. • November 2,9 James Mingo: Free probability and random matrices

150 • November 16,30 Peter Miller: Asymptotics for general orthogonal polynomials; Riemann-Hilbert approach. • December 7,14 Debashis Paul: Multivariate statistical applications

2.4.3.2 Spring 2007 SAMSI course on Geometry, Random Matrices, and Statistical Inference

From the perspective of inference, clustering, and machine learning, geometric ideas have been gaining greater emphasis. One reason for this has been the realization that predictive models with a small amount of labeled data can be greatly improved by incorporating unlabeled data. Thus the geometry of the marginal distribution provides salient and compelling information in many real world problems. This insight has led to a variety of statistical models and algorithms as well as the study of a variety of mathematical objects. A nonexhaustive list follows: spectral clustering, nonlinear dimensionality reduction,manifold learning, learning homologies, topological persistence, semi-supervised learning, non-parametric semi-supervised Bayesian models, the Laplace-Beltrami operator, graph diffusion models on manifolds, random projections. Most if not all of the above topics are intimately related to the study of random matrices either from an algorithmic perspective or from the perspective that the structure of a random matrix depending on data drawn from a measure is fundamental in understanding the topic. These topics are being presented from the perspective of Statisticians, Computer Scientists, and Mathematicians. The course started with a lightning review of statistical inference, topology, and differential geometry and then proceeded to seminar format. There will be a final project consisting of any of the following: 1. paper review 2. algorithm/model development 3. data analysis 4. theoretical analysis Instructors: Misha Belkin, Sayan Mukherjee, and Yury Mileyko

2.5. Working Group Reports

2.5.1 Climate and Weather

Group Leaders: Serge Guillas, Cari Kaufman, Doug Nychka, Debashis Paul Webmasters: Cari Kaufman and Elaine Spiller

The climate and weather group met as a reading group in preparation for the joint SAMSI/NCAR workshop held in November, at which point the climate and weather groups from the Computer Models and Random Matrices programs merged. The group focused primarily on principal component analysis and the ways it is used by climate

151 scientists. In particular, we studied methods for detection and attribution of climate change. We also had two guest speakers, Chunsheng Ma, who spoke on space-time covariance functions that might be appropriate for climate data, and Andrew Gettelman, who gave a remote tutorial from NCAR on climate modeling.

The group developed several ideas for research directions.

1) The usual tests and asymptotic distributions in principal component analysis assume the rows of the data matrix are exchangeable, which clearly does not hold in the case of climate observations that are taken over time. How should one choose the number of components to retain in this case? Jonty Rougier and a graduate student are working on a permutation test. 2) Can better estimates of the covariance matrix be used in the "fingerprint" methods for climate change detection? Current practice is to use the sample covariance matrix. 3) Can the covariance functions described by Chunsheng Ma (2005), which are negative over certain distances, be used to capture large-scale oscillation patterns in the climate system? In particular, can fitting these models to climate models and observations separately be used as a diagnostic tool for assessing how well the climate model captures such patterns? Cari Kaufman and Chunsheng Ma are exploring this for some pressure data. 4) An alternative to principal component analysis proposed by Debashis Paul is to first premultiply the data by the square root of a weighting matrix W, which would concentrate the resulting eigenvectors in certain spatial regions.

2.5.2 Wireless Communications

Group Leader: Jack Silverstein

In this group, we looked at several problems involving random matrices arising in the context of wireless communications. J. Silverstein and D. Paul have been involved in a project that looks into the behavior of the eigenvalue statistics, such as empirical spectral distribution, separation of eigenvalues from the bulk spectrum etc for the random matrices appearing in MIMO and CDMA systems. The problems considered are also relevant in spatio-temporal signal processing. In the latter case there is also interest in investigating the properties of the sample eigenspaces.

2.5.3 Universality

Group Leader: Peter Miller Webmaster: Manjunath Krishnapur

The Universality Working Group nucleated at the Opening Workshop held September 17 - 20, 2006. Three topics emerged between the Opening Workshop and the first meeting of the group that were of most interest to participants: (i) Universality of Circular Ensembles (random unitary matrices), (ii) "Beta-ensembles" (generalized eigenvalue statistics beyond the basic "threefold way" of matrix symmetries; representation and sampling issues), and (iii) Differential operator analogues of random matrix theory. The

152 first order of business for the Working Group was to have a focused discussion of these three topics with the aim of narrowing our scope to a list of specific questions within each topic.

The Universality Working Group functioned primarily as a venue for informal presentations by and for the group participants. All of the three topics listed in the paragraph above were addressed by these presentations and the ensuing discussions. Speakers included M. Huber ("Some experiments on tridiagonal matrices for beta ensembles"), M. Krishnapur ("Circular ensembles", "An invariance principle with applications to random matrix theory and last passage percolation", and "Non-Hermitian random matrices and the circular law"), P. Miller ("Asymptotics of orthogonal polynomials on the unit circle by Riemann-Hilbert techniques", "D-bar methods for orthogonal polynomials", and "Zakharov-Shabat eigenvalue problems"), D. Paul ("Beta ensembles and tridiagonal matrix models"), and S. Venakides ("Zakharov-Shabat operators as analogues of random matrix models" and "Riemann-Hilbert techniques"). Notes from all of these presentations can be found on the SAMSI website.

The Universality Working Group also had a significant interaction with the "Berkeley Node" of SAMSI. Via the videoconference link the Working Group participated "live" in presentations made in Berkeley by L. Choup, A. Dembo, S. Peche, and A. Soshnikov. This interaction was very useful for both the local participants and those in the studio in Berkeley.

Among the most regular local participants in the group were S. Belov (Duke), M. Huber (Duke), M. Krishnapur (UNC/SAMSI), W. LeFew (Duke), P. Miller (Michigan/UNC/SAMSI), D. Paul (UC Davis/SAMSI), E. Spiller (Duke/SAMSI), and S. Venakides (Duke). Regular remote participants included F. Goetze (Bielefeld, Germany) and I. Johnstone (Stanford). A substantial fraction of the regular participants were graduate students and postdocs; this suggests that the Universality Working Group had a positive educational component.

2.5.4 Regularization and Covariance

Group Leaders: Peter Bickel, Eitan Greenshtein, Debashis Paul, Noureddine El Karoui Webmasters: Debashis Paul and Noureddine El Karoui

The general focus for this working group was regularization methods in high dimensional statistical inference with emphasis on estimation of high dimensional covariance matrices. Discussions on the general notion of regularization were centered mainly around the papers of Bickel and Li (2006) and of Breiman (1996). Two key themes of the discussions were: (i) regularization as means of obtaining better prediction performance in high dimensional data analysis problems, and (ii) regularization as a way of selecting the correct model in a parametric statistical framework.

Unlike in classical estimation, where the minimization of squared error loss is a widely accepted procedure, many loss functions are popular in the context of estimation of the

153 population covariance. Examples are, Frobenius norm, Kullback-Liebler distance under Gaussian assumption, the spectral norm, and many more. Of course in choosing a criterion or a loss, mathematical convenience is an important issue, but a better understanding of the appropriate loss for a specific context may be desirable. In applications of regularization techniques to covariance estimation, we had discussions about estimation under structural assumptions on the population covariance matrix, and also in a nonstructural context.

Nonstructured context: Given an observed random matrix Q, with Σ = E(Q), in many statistical problems the goal is to find a vector β with largest (or smallest) value of the quadratic form βTΣβ, where the selection of the vector β is based on the observed Q. When Qpxp is a Wishart matrix, and p» n, based on a sample of size n, there is no hope of finding a vector β, based on Q, that even nearly the minimizes/maximizes βTΣβ. This is true unless we have strong assumptions on the structure of Σ. When there are no such assumptions, a reasonable approach is to limit ourselves to a subset B of vectors β satisfying certain constraints. The goal is to find nearly the best vector β in B. Of course the larger B is, the better. This has to do with the concept of persistence, as described in Greenshtein and Ritov (2004) (see also the tutorial by Greenshtein).

A popular choice of B is to take it as an l1 ball with an appropriate radius. Alternatively introduce an l1 constraint on β as in Lasso (Tibshirani (1996)). In practice one minimizes T β Qβ, subject to the l1 constraint. Zou and Hastie (2005) proposed to consider a combination of constraints, e.g., l1 and l2 constraints through a procedure they termed ‘elastic net’. A reasonable combination of constraints, proposed by Greenshtein, seems to T be an l1 constraint, combined with a constraint on the estimated quantity Ŵar(β Qβ) of Var( βTQβ). Note that the last constraint is random. It is hoped that this additional constraint will lead to an improvement in the performance of the estimator.

Yuan and Lin (2005), considered estimation of a non-structured covariance matrix using the method of maximum likelihood under an l1 constraint on the off-diagonal entries of the covariance matrix (see also the tutorial by Yuan). This involves a sophisticated convex optimization method. However, their set up is that of p < n. Peter Bickel, during one of the Berkeley node seminars, referred to similar work (with a different penalty) being carried out by Levina and Zhu. This avenue and the study of the statistical properties of the estimators obtained by such penalized empirical risk minimization procedures may be a fruitful area of research, particularly in the situation when p and n are roughly of the same size.

Structured context: We discussed several different structural assumptions about the covariance matrix that are applicable in various problems. One central question was how to translate available information about the problem into the specification of a structure for the covariance. One natural structure is that of a sparse covariance matrix. One subclass of this consists of matrices that, under an appropriate permutation of variables, have a block diagonal structure. Some discussion on the relevance of this in certain econometric problems may be found in Guhr and Kaebler (2003) (see also tutorial by

154 Guhr), and in a tutorial by Paul. It was suggested that estimation of maximal eigenvectors could be helpful in identifying such a structure. Similar methods have been used to deal with the problem of graph partitioning. Some discussions involved the more general setting of graphical models and of finding the zeroes of the precision matrix Σ-1. In this context we discussed the paper by Meinshausen and Bühlman (2006). See also the aforementioned paper by Yuan and Lin.

Another structure that has been studied and discussed is that of a factor model. See the tutorial by Lv and the manuscript by Fan et al. (2006) for more details. Estimation of Σ and Σ-1 is studied in the last article. It also includes some discussions on the choice of loss functions. A similar model has been used in the context of functional data analysis (see also the tutorial by Paul), and recent results from random matrix theory (c.f. Paul (2006)) shed light on the statistical properties of the estimates obtained under the model.

A covariance matrix of a Toeplitz form is natural in various statistical contexts, e.g. involving a stationary time series. In Bickel and Levina (2006), estimation of a covariance matrix Σ is studied, under the assumption that |σij| converges to zero as |i-j| approaches infinity, in such a way that the contribution of the elements away from the diagonal of Σ become negligible. Here Σ = Σ∞x∞. A special case of this involves matrices that are certain additive perturbations of Toeplitz matrices. See also the tutorial by Bickel. The suggested estimation method involves banding, and the loss function is the spectral norm of the difference. Partly motivated by this approach, Zeitouni and Anderson studied the problem of banding of certain classes of random hermitian matrices in a general context and derived new asymptotic results on the behavior of the empirical spectral measure of such matrices (see the lecture notes by Zeitouni).

Various classical statistical procedures were proposed regarding the estimation of the mean vector and covariance matrix in high-dimensional context. As a part of Berkeley Node seminar, Eitan Greenshtein presented a method as an alternative to the Empirical Bayes procedure for estimating the mean vector in Gaussian signal-plus-noise model. The method has excellent empirical performance even when the true signal is dense. Whether one can design an estimation procedure for the covariance that can perform under a wide range of sparsity of the entries of the matrix is an open question. Also, Martin Wainwright gave a presentation in the Berkeley Node seminar about the model selection properties of L1 regularization in the context of Gaussian linear regression.

On the application side, Young Truong gave an insightful lecture on fMRI data and the potential regularization procedures that can be utilized to answer some of the questions.

In addition to these activities, members of this working group actively participated in collaborative activities with the members of other working groups. We give a brief description of these activities below.

• Interaction with Climate group: Problems in climatology and atmospheric science involve dealing with large data on processes that vary in both space and time. Computation of large covariance matrices is a part of many well-known methods (e.g.

155 Kalman filtering) that are used in these contexts. Our interaction with the Climate group involved learning about the problems arising from these fields and assessing the possibility of developing statistical methodologies for dealing with the questions that arise. Carie Kaufman gave a talk on the use of tapering for an efficient computation of estimates of parameters describing a spatial process. There is clearly a lot of scope of collaborative work with scientists working in this area.

• Interaction with Universality group: Many inference questions relating to large dimensional covariance matrices can be addressed by using tools from random matrix theory. The interaction with Universality group provided an opportunity to know some of the latest developments in this fast-growing field.

• Interaction with Graphical models group: We discussed several applications in the context of structured covariance matrices that can be formulated in the framework of Gaussian graphical models. Some model selection issues in this context were addressed through interactions between our group and the Graphical models group.

References

1. Bickel, P., and Levina, E. (2006): Regularized estimation of large covariance matrices. Technical report.

2. Bickel, P., and Li, B. (2006): Regularization in Statistics. Test, 15, 271-344.

3. Breiman, L. (1996): Heuristics of instability and stabilization in model selection. Annals of Statistics, 24, 2350-2383.

4. Fan, J., Fan, Y., and Lv, J. (2006): High dimensional covariance matrix estimation using a factor model. Technical report.

5. Greenshtein, E., and Ritov, Y. (2004). Persistence in high dimensional linear predictorselection and the virtue of over parametrization. Bernoulli, 10, 971-988.

6. Guhr, T., and Kaebler, B. (2003): A new method to estimate the noise in financial correlation matrix. Journal of Physics A: Math. Gen., 36, 3009-3032.

7. Meinshausen, N., and Bühlman, P. (2006): High dimensional graphs and variable selection with Lasso. Annals of Statistics, 34, 1436-1462.

8. Paul, D. (2006): Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, (to appear).

9. Tibshirani, R. (1996): Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society, Series B, 58, 267-288.

156 10. Yuan, M., and Lin, Y. (2005): Model selection and estimation in the Gaussian graphical model. Biometrika, (to appear).

11. Zou, H., and Hastie, T. (2005): Regularization and variable selection via the elastic net. Journal of Royal Statistical Society, Series B, 100, 301-320.

2.5.5 Geometric Methods

Group Leader: Makram Talih Webmaster: Jayanta Pal

After the initial organizational meeting, participants in the Geometric Methods working group decided to focus on building an intuitive understanding of the geometry of the cone of symmetric positive definite matrices, with a view to exploit their intrinsic structure for modeling and inference in highdimensional covariance matrices. To achieve this goal, the group discussed two important papers in the field: one by Moakher (2005) on defining the geometric mean of symetric positive-definite matrices using the notion of geodesic curves and on building a gradient-descent algorithm for its numerical approximation; the second by Fletcher & Joshi (2004), who, in the context of diffusion-tensor magnetic resonance imaging, introduce principal component analysis, coined principal geodesic analysis, whose modes of variation are geodesic lines in the cone of diffusion tensors (ie. 3X3 variance-covariance matrices).

The next order of business for the group, and one of the group's stated main themes of research, has been to investigate probability distributions on covariance manifolds, in particular, the cone of symmetric positive definite matrices, which is looked upon as a Riemannian manifold. We were especially interested in reading about some recent findings by a research group at the INRIA, based in Sophia-Antipolis, France, especially Lenglet et al (2004) and Pennec (2004). Working group member Dr. Armin Schwartzman was keenly active in this endeavor, and gave an informal talk about his own research on distributions for positive definite matrices on 11/30/06. He was also later invited to present his findings in the follow-up workshop on Geometry, Random Matrices, and Statistical Inference, held at SAMSI in January 2007. The study of probability distributions on the cone of positive definite matrices is intimately related to the question of defining a Normal distribution thereon, since the latter is the principal paradigm for such distributions, but also since it appears as the limiting distribution in the central limit theorem. Thus, the group highly benefited from close interaction with another of the HDRM working groups, namely the Multivariate Distributions working group, with which most group members were also affiliated. Prof. Don Richards, especially, was instrumental in leading the discussion on 11/30/06 about the important role of the Helgason-Fourier transform in the derivation of a central limit theorem on spaces of positive definite matrices. On 12/14/2006, Prof. Richards also gave a presentation on minimax estimation for the deconvolution problem over the space of positive definite matrices.

157 The theme of inference for time-varying structures has been mainly taken up by the group's leader, Dr. Makram Talih, who has presented work on constructing a Markov chain on the cone of positive definite matrices such that the path between consecutive points is a geodesic segment. Dr. Talih's line of research is motivated by the problem of estimating the covariance structure in the multivariate Normal model when the underlying matrix is driven by a hidden Markov chain. Dr. Talih's work has greatly benefited from the working group's input and suggestions, and was featured in the Bayesian focus week in October 2006, as well as in the SAMSI Education and Outreach program's two-day undergraduate workshop in November 2006.

Motivated by the HDRM program's regularization theme, the geometric methods working group has also been discussing the use of geodesic distance between variance-covariance matrices in defining sensible loss functions that are invariant to inversion and orthogonal transformations. In this respect, the geometric perspective provides valuable insight for the study of shrinkage and regularization.

2.5.6 Multivariate Distributions

Group Leaders: Don Richards, Iain Johnstone Webmaster: Jayanta Pal

The group leaders, Iain Johnstone and Don Richards, with assistance from Jayanta Pal, organized the group activities by finding speakers for each weekly meeting, coordinating group activities with the SAMSI staff and directorate, and maintaining contact between group members located at SAMSI or elsewhere.

The webmaster, Jayanta Pal, maintained the group's web page at . The web page was updated at least twice weekly, listing upcoming group activities, talks and discussions, papers and smartboard notes, and interactions with other working groups. A key feature of the group meetings was the participation of off-site members through the WebEx system; this enabled high virtual attendance by off-site group members. The website has been maintained into this semester and contains all notes and papers which formed the basis for group discussions and activities.

Throughout the semester, there also was strong interaction with the Working Group on Geometric Methods. Makram Talih, leader of the Geometric Methods group, was particularly helpful in fostering this joint interaction. As a consequence, the two groups held three joint weekly meetings. At other meetings of the Geometric Methods group, members of the Multivariate Distributions group offered their perspectives on discussions ranging from Central Limit Theorems on spaces of positive definite matrices to concepts of geometric means for positive matrices.

Members of the Working Group on Multivariate Distributions and their affiliations are as follows: Jeongyoun Ahn (University of Georgia), Edo Airoldi (Carnegie-Mellon University), Leonard Choup (UC-Davis), Noureddine El Karoui (UC-Berkeley),

158 Friedrich Goetze (Bielefeld University), Thomas Guhr (Lund Inst. of Technology), Iain Johnstone (Stanford University), Plamen Koev (MIT), Stas Kolenikov (University of Missouri), Yoshihiko Konno (Japan Women's University), Helene Massam (York University), Jing Naihuan (NC State University), Jayanta Pal (SAMSI), Debashis Paul (UC-Davis), Xingye Qiao (UNC), Bala Rajaratnam (SAMSI), Don Richards (Penn State University), Igor Rumanov (UC-Davis), Dhruv Sharma (NC State University), Leonard Stefanski (NC State University), Ming Yuan (Georgia Tech).

The weekly meeting activities of the group were as follows:

Date: 09/28/06 Speaker: Bala Rajaratnam Topic: Marginal likelihood for the eigenvalues of covariance matrices

Date: 10/05/06 Speaker: Donald Richards Topic: Multimodality of the likelihood function for the Behrens-Fisher problem

Date: 10/12/06 Speaker: Donald Richards Topic: An introduction to zonal polynomials and hypergeometric functions of matrix argument

Date: 10/19/06 Speaker: Plamen Koev Topic: The combinatorial definition of the Schur, zonal, and Jack polynomials

Date: 10/26/06 Speaker: Donald Richards Topic: Generalizations of the Wishart distribution arising from monotone incomplete multivariate normal data

Date: 10/30/06 - 11/3/2006 The group members attended the SAMSI program, "Bayesian Focus Week"

Date: 11/9/06 - 11/11/2006 The group members attended the SAMSI program on "Large Graphical Models and Random Matrices"

Date: 11/16/06 Speakers: Jayanta Pal and Donald Richards Topic: Discussion on MIMO capacities, representation theory and eigenvalue computations.

Date: 11/30/06 Speaker: Noureddine El-Karoui

159 Topic: Finite point processes and eigenvalues of random matrices

Date: 12/07/06 (Joint meeting with the working group on Geometric Methods) Speaker: Hongtu Zhu, UNC Chapel Hill Topic: Statistical analysis of diffusion tensors in diffusion-weighted magnetic resonance image data

Date: 12/14/06 (Joint meeting with the working group on Geometric Methods) Speaker: Makram Talih Topics: 1. Loss functions on the space of covariance/precision matrices. 2. Matrix-valued martingales/random walks

Date: 12/14/06 (Joint meeting with the working group on Geometric Methods) Speaker: Donald Richards Topics: Diffusion tensor imaging, and deconvolution density estimation on spaces of positive definite symmetric matrices

2.5.7 Graphical Models/Bayesian Methods

Group Leader: Helene Massam Webmaster: Bala Rajaratnam

2.5.7.1 Identification of issues of interest

From the opening workshop that took place from September 17th to 20th, 2006, it was clear that research in the area of graphical models had common threads with several of the research areas of various speakers at the Opening Workshop. The following most important common threads were identified:

(a) Moment computations for Wishart matrices. H. Massam gave a talk on this topic at the opening workshop. The results are closely linked to the work presented the following day by James Mingo and Raj Rao.

(b) Estimation of large random matrices. Dealing with large random matrices is why statistical graphical models were created. One of the advantages of graphical models is the reduced number of parameters. This is clearly linked to the concept of regularization, within the broader area of high-dimensional inference, which was at the heart of at least two talks, by Bickel and Levina and by La_erty and Wasserman.

(c) The study of various extensions of the Wishart distribution. Extensions and modifications of the Wishart distribution arise naturally in the context of graphical models. The fascinating talk by Ioanna Dimitriu gave us a glimpse in the various β-ensembles of random matrices and the behaviour of the corresponding eigenvalues. Further studies of the Wishart extensions in statistics

160 will certainly benefit from results already obtained by researchers such as Dimitriu.

(d) Last but certainly not least, model selection which was the common underlying thread to many of the “applied” talks.

Parameter estimation and model selection problems are at the heart of much of the work in Bayesian statistics and the combination of graphical models and Bayesian statistics was a natural one. The topics mentioned above were therefore identified at the end of the Opening Workshop as some of the possible areas of research for our GMBM group.

2.5.7.2 The participants

Twenty six people signed up to be part of our group and were present during the weekly meetings on various occasions. The regular participants were Eitan Greenshtein, Jinchi Lv, Hélène Massam, Debashis Paul, Bala Rajaratnam, Makram Talih and Zheng Lei. Jim Berger and Carlos Carvalho also joined us for many meetings.

2.5.7.3 Activities

1. Weekly two-hour GMBM meetings where papers pre-assigned by the group leader were read and commented. The reading was usually directed by the group leader at the beginning but always turned into a deep and lively discussion of the topic considered. As listed on the SAMSI website, the papers read in the first few weeks were • Bickel and Levina (2006) Regularized estimation of large covariance matrices • Meinhausen and Buhlmann, (2006) High dimensional graphs and variable selection with the LASSO • Dobra and West (2004) Bayesian Covariance Selection • Jones et al (2005), Experiments in Stochastic Computation for High- Dimensional Graphical Models • Wainwright and Jordan (2003) The last three weekly meetings were concentrated around the presentation of the paper by Berger and Sun (2006) by Dongchu Sun (two sessions) and a presentation and discussion on reference priors by Jim Berger.

2. Two workshops • Bayesian Focus Week (oct. 30-Nov.3), organised by Hélène Massam, a week-long workshop with twenty speakers from the US, Canada and Europe

• Large Graphical Models and Random Matrices (Nov. 9-11, 2006), organized by Nanny Wermuth, a two- and half-day long workshop with seventeen speakers from the US, Canada and Europe.

161 Of course, a lot of discussions took place during these two workshops and several close scientific contacts established. The organizers from both workshops received invaluable assistance from B. Rajaratnam.

3. Research and results

At this moment, it is difficult to fully identify all the scholarly work that is or will be done as a consequence of our work at SAMSI during the Fall of 2006. However, we can already say the following three papers are being written or have been written:

• Bala Rajaratnam, Hélène Massam and Carlos Carvalho are writing a paper on covariance estimation which they intend to submit to the Annals of Statistics for the special SAMSI issue that has been planned.

• Hélène Massam, Debashis Paul and Bala Rajaratnam are writing a paper on model search for discrete log-linear models which should be completed by the summer of 2006.

• Eitan Greenshtein, Junyong Park and Ya’acov Ritov have completed the writing of a paper entitled ”Estimating the mean of high valued observations in high dimensions”.

It should be noted that the first two papers are the direct product of the cooperation facilitated by the program on Large Random Matrices. The ideas were discussed shortly before the fall but the work was done entirely during this 2006-2007 academic year. 4. Training of a postdoctoral fellow Bala Rajaratnam is a postdoctoral fellow at SAMSI for the academic year 2006/2007. He is co-sponsored by Iain Johnstone and H´el`ene Massam. During the fall of 2006, he was introduced to the research area of graphical models and Bayesian inference. He now has a good basic working knowledge of this topic and he is forging ahead at a fast pace. I expect him to be an independent member of this research community within a few months.

5. Links established between different research areas The work described above under “Weekly readings” and “Research and Results” is mostly concerned by topics (b) and (d) described above in “Identification of issues of interest”. We have already discussed problems in (a) and (c) that we deem interesting. These and other relevant topics were further discussed during the AIM workshop which took place from April 11 to April 14, 2007.

162 2.6 Spring Semester Concentration on Geometry

The continuation began with a four day workshop. Both algorithms and the fundamental mathematical objects computed by the algorithms were stressed. One objective of the workshop was to provide computer scientists, statisticians, probabilists, geometers, and topologists and opportunity to interact. The topics that were covered in this workshop were Geometry and Sparsity, Geometry and Topology, Machine Learning, and Random Matrices and Covariances.

After this initial workshop an intensive working group and class formed focusing on theoretical and applied aspects of toplogical statistics. The main effort in this program has been two fold and the mathematical object focused upon has been persistent homology and persistence diagrams. The first topic the group has addressed is a formal notion of consistency for persistence diagrams. A corollary of this is a proof of convergence of homology inference from a point cloud. This approach improves upon results of Niyogi, Smale, and Weinberger. The more relevant part of the work has been focusing on ringing statistical notions of uncertainty to persistence. Two approaches have been proposed and are currently being implemented: one based upon bootstrap estimates, the other based on Bayesian density estimation procedures. As of this report, this is an ongoing activity that will continue through the Spring Semester, 2007.

2.7. Outcomes

The product of the working group activities will primarily be represented by publications, student projects and the initiation of ongoing collaborations.

A key impetus for the program was the idea that a variety of problems in high dimensional statistics would benefit by the infusion of results and techniques from an area of mathematics (random matrix theory) that has long focused on many-variable approximations (the “thermodynamic limit”). Conversely, a goal for the program has been to stimulate new research topics and directions for applied mathematicians.

Peter Bickel is the guest editor in charge of a special issue of the Annals of Statistics devoted to articles written by participants in the program and in the workshops.

163 3. Multiplicity and Reproducibility in Scientific Studies

3.1 Objectives Concerns over multiplicities in statistical analysis and reproducibility of scientific experiments are becoming increasingly prominent in almost every scientific discipline, as experimental and computational capabilities have vastly increased in recent years. This 2006 SAMSI summer program looked at the following key issues. Reproducibility: A scientist plans and executes an experiment. A clinical trials physician runs a clinical trial assigning patients to treatments at random and blinding who has what treatment. A survey sampling person collects a survey. Scientists use statistical methods to help them judge if something has happened beyond chance. They expect that if others replicate their work, that a similar finding will happen. To clear a drug the FDA requires two studies, each significant at 0.05. There is growing concern that a number of such scientific studies are not replicating, and understanding the reasons for this was a top priority for the program. Subgroup Analysis: Large, complex data sets are becoming more commonplace and people want to know which subgroups are responding differently to one another and why. The overall sample is often quite large, but subgroups may be very small and there are often many questions. Genetic data is being collected on clinical trials. Which patients will respond better to a drug and which will have more severe side effects? Disease, drug, or side effects can result from different mechanisms. Identification of subgroups of people where there is a common mechanism is useful for diagnosis and prescribing of treatment. Large educational surveys involve groups with different demographics, different educational resources and subject to different educational practices. What groups are different; how are differences related to resources and practices? What really works and why? Is the finding the result of chance? There is a need for effective statistical methods for finding subgroups that are responding differently. There is a need to be able to identify complex patterns of response and not be fooled by false positive results that come about from multiple testing. The program brought together statisticians and subject experts to develop and explore statistical strategies to address the subgroup problem. The benefit will be creditable statistical methods that are likely to produce results that will replicate in future studies. Massive Multiple Testing: The routine use of massively multiple comparisons in inference for large scale genomic data has generated a controversy and discussion about appropriate ways to adjust for multiplicities. The program studied different approaches to formally describe and address the multiplicity problem, including the control of various error rates, decision theoretic approaches, hierarchical modeling, and probability models on the space of multiplicities, and model selection techniques. Applications include genomic data, clinical trial design and analysis, record matching problems, classification in spatial inference, and anomaly discovery and syndromic surveillance. The goal of the program was to identify the relative merits and limitations of the competing approaches, and to understand which features of reproducibility are addressed.

164 3.2 Workshops

3.2.1 Opening Workshop

The Kickoff Workshop was held July 10-12, 2006, and focused on clear formulation of the challenges in the area, especially the issues of reproducibility of scientific experiments, massive multiple testing, and subgroup analysis. The workshop set the stage for the subsequent Program research, through tutorials, talks and discussion sessions designed to focus attention on the key problems to be considered by the program working groups.

3.2.1 Transition Workshop

An informal Transition Workshop was held July 27-28, 2006, summarizing the results of the Program research and discussing the remaining challenges.

3.3 Working Groups and Outcomes

3.3.1 Report on Subgroup Analysis working group

The activities of this working group combined consideration of background papers, general methodological talks, and related discussions with emphasis on practical advice to researchers on how to handle multiplicity in their studies and particularly how to report results. The focus of this advice was primarily on epidemiological studies and clinical trial research in medicine. Much of our background preparation involved becoming acquainted with two types of published articles:

(a) the many publications giving advice on how to (or whether to) carry out subset analyses, whether planned or unplanned, in such research, taking into account the problems of possible inflation of errors due to multiple testing. Many of these articles give good advice, but many such studies are still being conducted without following that advice.

(b) articles documenting the large number of reported results of clinical trials and epidemiological studies that are not confirmed in follow-up research, many of which include subgroup analyses. It seems clear that a major contributor to this lack of confirmation is inattention to, or inadequate appreciation of, the effects of multiple testing.

Our group consisted of both Bayesians and frequentists. While we may have had differences in the ways in which we approach problems, we had broad areas of agreement on the basic issues. We had some preliminary discussions about types of subgroups and subgroup analyses. One way of dividing such analyses is in terms of whether they concentrate on (i) a set of demographically-defined subgroups, e.g. groups defined in terms of race-

165 ethnicity, geography, genomic classification, etc. to see which show either effects of a treatment or effects that differ from those of other such subgroups, or (ii) what might be called outcome-based analysis: classifying patients into groups based on their treatment outcomes, and looking for demographically defined subgroups or covariates that predict the differences, e.g. in a medical study looking at subgroups of patients in which the treatment is effective, not effective, harmful. Studies can focus on one of these aspects or combine them. The former approach often starts with a small number of defined groups, although genomic analysis has recently led to much larger numbers. In this context subgroup analysis sometimes involves outcomes separately within each subgroup, whether a treatment does or does not have apparently-significant effects in the total group, and sometimes comparing sizes of effects in different subgroups. The latter usually considers large numbers of covariates. The former type often uses methods such as analysis of variance and regression; the latter often involves clustering methods. Some of the talks in our session fit into each of these approaches; e.g. Robert Obenchain's talk dealt mostly with the outcome-based analysis while Siva Sivaganesan took primarily the defined subgroup approach. We appreciate the fact that in complex studies it is not easy to plan analysis and reporting taking multiplicity into account. Rather than proposing specific methods, or specific types of error control (e.g. control of family-wise error, false discovery rates, consideration of posterior probabilities, etc.) which in any case would depend on many characteristics of the studies, we decided to propose a three-level plan for taking multiplicity into account that would work with a variety of specific types of studies, types of error control, and details of analytic plans. In fact, our advice is general enough to cover all types of tests, not only those based on subgroup analysis. Recognizing that many researchers will explore their data and are anxious to glean as much as possible from the results, we propose some compromises between strict adherence to pre-determined protocols and unlimited data dredging in reporting study results. We suggest that researchers dedicate a 5% significance level for testing a small, targeted group of primary outcomes (perhaps only one) and an additional level, possibly 5%, for testing an entire set of pre-designated secondary outcomes. Any additional outcomes that draw attention, either because of very small p-values or large estimated effect sizes, should be reported as speculation, unsupported by good statistical methods even if those p-values or estimated effect sizes appear to be unusually compelling. In some cases it may be possible to estimate an overall error probability for this exploratory set, but in most cases it will not be possible. Any theory or empirical evidence supporting these additional results should be presented, but the results remain speculation unless supported by additional targeted research. We propose that the level at which secondary outcomes are tested should be at most the original level. One advantage of using the original level is that if some secondary outcomes are very highly correlated with the original ones, there is no loss of power in using that level, assuming appropriate consideration is given to the correlations within the secondary class. We made plans for a joint paper or papers with our recommendations and with detailed information, general analytic advice, and supporting discussion of concrete examples. No further progress has been made on this paper to date, but a number of

166 activities, stimulated by our discussions and presentations, and aided greatly by the note- taking of Rochelle Trachtenberg, have been undertaken by participants in the workshop.

Some activities that have been influenced by the program:

1. Siva Sivaganesan, Prakash Laud and Peter Mueller continued work on a joint project on a Bayesian approach for subset analysis in clinical trials. Discussions in the workshop provided important background and motivation for continued work on this project. The proposed approach is on the interface of the two workshop themes "subset analysis" and "multiplicities." Besides the discussions in the subgroup working group we also gained important insights for this project from interactions with colleagues in the multiplicity working group. Results have been presented at the recent ENAR meeting. A draft manuscript is in preparation.

2. Stanley Young and Juliet Shaffer co-sponsored a three-hour symposium at the February 2007 meeting of the American Association for the Advancement of Science entitled "Mixed health messages: Observational versus randomized trials" with five speakers and three discussants. Juliet Shaffer spoke on "History of multiple testing" with much content informed by workshop discussions. (The workshop was well-attended with lively ensuing discussion.)

3. Juliet Shaffer was a participant in the Grants Award Conference sponsored by the American Educational Research Foundation from Sept. 28-Oct. 1, 2006. This conference was to give research advice to graduate students and new PhD's who had received AERA research grants. Juliet had been involved in preparation of the ASA-NSF-sponsored report "Using Statistics Effectively in Mathematics Education Research" and gave a report on this guide, also emphasizing issues covered in the SAMSI workshop, especially noting the importance of replication and results of replications of medical studies. (In discussion following the report, students noted that they are often discouraged from replicating studies, informed that "prestige" requires innovative research.)

4. At the Third Erich L. Lehmann symposium, to be held in May 2007, there will be two sessions related to and influenced by the workshop: (a) a session organized by Jim Berger on the workshop in general, and (b) a session organized by Juliet Shaffer on multiplicity issues related to the workshop topics. One of the speakers in the latter session will be Charles Lewis, who will speak about similar multiplicity issues in education and psychology, areas for which the workshop material is relevant but which were not addressed there.

5. The workshop influenced Juliet Shaffer to work on some multiple comparison issues in clustering which is now ongoing with two psychologists, Harvey Keselman and Rhonda Kowalchuk. Some results will be reported at the Fifth International Conference on Multiple Comparison Procedures to be held in July, 2007.

167 6. Bob Obenchain (Eli Lilly) has continued his "meaningful subgroup" research on adjustment for all forms of bias in observational (nonrandomized) studies. Through systematic application of sensitivity analyses, Bob's "Local Control" (LC) approach forms, splits, compares and (following over-shooting) re-combines subgroups of most comparable patients. This approach has considerable appeal to Bayesians because, without imposing any prior distribution, it reveals and smooths the sample distribution of local treatment differences (LTDs.) Rather than focus on point estimates in very large "samples" and their highly questionable p-values, Bob's approach uses a simple nested ANOVA model (treatment within patient cluster) and forms non-parametric confidence and tolerance intervals or regions. Bob presented an update of his LC concepts at the March 2007 ENAR meeting and is also working with Doug Faries (Lilly), Andrew Leon (Cornell) and Josep Maria Haro (Barcelona) to publish a book on "Analysis of Observational Health-Care Data."

3.3.2 Report on Massive Multiple Testing (MMT) working group

The plan for this working group was to study different approaches, consider applications in inference for genomic data and other research problems that require massive multiple testing. The goal was to identify the relative merits and limitations of competing approaches for diverse applications.

3.3.2.1 Workshop, Lectures and Group Discussions

Speakers and topics for the talks in the opening workshop were chosen to reflect the diversity of approaches proposed in the recent literature. Different approaches that were discussed included model-based Bayesian approaches, frequentist false discovery rate control, decision theoretic approaches and theoretical considerations of the false discovery process as a function of a threshold for the rejection region. Literature related to these approaches has developed largely separately with most researchers working exclusively in one research direction. The workshop provided an opportunity for all participants to learn about current problems and ideas across different approaches. The workshop program intentionally mixed talks in the three big program areas, reproducibility, subgroup analysis, and massive multiple testing. Although all three areas depend on common underlying mathematical principles, recent research in these areas has developed separately. Similar to the exchange among researchers working in different research areas within multiple testing, exchange of ideas across the three working groups lead to interesting new applications and research projects reported below.

3.3.2.2 Working Group Discussions and Results

After the opening workshop the MMT working group continued to meet almost daily, usually inviting one participant to present an in depth discussion of a selected research topic. This mechanism and informal exchange among participants working in the different working groups naturally lead to interesting new research projects. We report some examples that we are aware of.

168

Loss function for credible intervals: In informal discussions between Dani Yekutieli and Ken Rice the question of a formal decision theoretic justification of Bayesian credible intervals was brought up. This was in reaction to earlier talks that pointed out various derivations of FDR-based rules as Bayes rules on one hand, and a review of methods to control FDR-type error rates for frequentist confidence intervals on the other hand. Despite the routine use of credible intervals in Bayesian data analysis, it seemed there was no good reference to derive them as formal Bayes rules. In following research Ken Rice developed an appropriate argument. This has meanwhile led to a draft manuscript that will likely become a classic reference.

Bayesian justification of FDR rules: Another discussion involved Yoav Benjamini, Ken Rice and others. In one of the daily lectures S. Bayarri conjectured that it was not possible to justify FDR control as a Bayes rule from first principles, without including realized FDR as an exlicit part of the loss function. Several alternatives were discussed that approximately lead to rules similar to Benjamini and Hochberg’s FDR control. Ken Rice has meanwhile developed an argument based on a loss function for credible intervals. The rule approximately leads to FDR control.

Bayesian subgroup analysis and edge detection: A graduate student participant, James Scott, reports that the workshop motivated three projects that will likely constitute part of his thesis. The projects are fully Bayesian treatment of subgroup analysis, variable selection with multiplicity control, and multiplicity selection for edge detection in spatial inference.

Gene-environment interactions: Woncheol Jang has developed two multiple testing projects as a result of his workshop participation. In one project he will consider models with gene-environmental interactions that include multiplicity control. In the second project he will consider multiple testing for high dimensional data as they arise from microarray experiments.

Bayesian subgroup analysis: Siva Sivaganesan, Prakash Laud and Peter M¨uller developed an approach for subgroup analysis based on Bayesian multiplicity control. The project is also reported as an outcome of the subgroup analysis working group.

H. Finner and T. Dickhaus report the following research nuggets:

• FDR control: Discussions with Yoav Benjamini, Daniel Yekutieli and Sanat K. Sarkar on possible improvements of FDR-controlling procedures.

• Dependence: Discussions on volativlity of the FDR proportion under dependency. Gene expression data: Learning about genome-wide association studies / SNP data analysis (discussion of Lei Sun)

169 • Asymptotics: Large p, small n asymptotics

• New projects: Two new projects were initiated, “Asymptotic improvements of some FDR procedures based on an asymptoticaly optimal rejection curve” and “Optimal rejection curve for FDR control in various models.”

3.3.2.3 Conference and Workshop Presentations

Participants at the SAMSI workshop have organized sessions in several prominent statistics meetings that will highlight results from the SAMSI workshop. Stan Young set up a 3 hour invited session at the annual meeting of the AAAS (American Association for the Advancement of Science). An attending reporter for the Economists wrote a short article that appeared in the print edition of the Economist. Jim Berger and Peter Műller organized a session at the ENAR (Eastern North America Region of the Biometric Society) meeting in Atlanta, with talks related to subgroup analysis and multiplicity control. The session was well attended and highlighted results from the SAMSI workshop. Jim Berger and Juliet Shaffer are organizing two sessions at the upcoming Lehmann Symposium at Rice University, Houston. The diverse nature of the AAAS, ENAR and Lehmann Symposium meetings reflects the universal importance of the SAMSI program, and the diversity of the program participants.

3.3.2.4 Manuscripts

The following papers will acknowledge SAMSI support.

• Ken Rice, ”Bayesian Decision Theory for Multiple Comparisons”

• Siva Sivaganesan, Prakash Laud Peter Mueller, ”Subgroup analysis - a Bayesian decision theoretic approach” Ken mentions: ”Draft title above, I am working on the draft paper right now! It will say extremely nice things about SAMSI in the acknowledgements, I wouldn’t be working on this without the workshop opportunity.”

H. Finner and T. Dickhaus report that the workshop motivated them to write up the following paper:

• H. Finner, T. Dickhaus And M. Roters, “on the false discovery rate and an asymptotically optimal rejection curve,” submitted for publication.

170 4. Education and Outreach Program

The SAMSI Education and Outreach (E&O) Program encompasses a variety of activities which have achieved national stature for both their scientific and pedagogical content. The annual activities include two-day Undergraduate Outreach Days held in November and March, a week-long Undergraduate Workshop (UGS) held in May, and the ten-day Industrial Mathematical and Statistical Modeling (IMSM) Workshop for Graduate Students that is held at the end of July.

Undergraduate Outreach Days:

The two outreach workshops are held annually to expose undergraduates from programs around the country to topics and research directions associated with concurrent SAMSI programs. One goal of these workshops is to illustrate the application and synergy between mathematics and statistics which goes far beyond that which students have seen in coursework. The overall objective is to broaden the perspective of students with regard to both future graduate studies and career choices.

High Dimensional Inference and Random Matrices: November 17-18, 2006

The November outreach workshop focused on topics from the SAMSI Program on High Dimensional Inference and Random Matrices. The students were provided with an overview of SAMSI by Ralph Smith (SAMSI-NCSU) after which program leaders, participants, postdocs and students gave a variety of presentations and tutorials. During the Friday morning session, Makram Talih (Hunter College-CUNY) and Serge Guillas (Georgia Tech) gave overviews regarding statistical analysis and statistical models for climate change. During the first session of the afternoon, Raj Rao (MIT) led a MATLAB tutorial on “Numerical Experiments in Random Matrix Theory” in which students were led to postulate results related to the 2006 Fields Medal theory of Okounkov and Anderson’s 1977 Nobel Prize regarding localization in crystals. Following a presentation by Eitan Greenshtein on high dimensional inference, Ralph Smith led an open discussion focused on graduate school and career options related to the program. During dinner on Friday, members of the directorate and program interacted with students to further discuss career opportunities in the field. The workshop concluded on Saturday with presentations by Noureddine El Karoui (UC, Berkeley), Elaine Spiller (SAMSI postdoc) and Donald Richards (Penn State) on topics and applications pertaining to random matrices. Details regarding the workshop can be obtained at the website: http://www. samsi.info/workshops /2006ug-workshop200611.shtml. There were 22 student participants which included 10 females and 7 African Americans.

Development, Assessment and Utilization of Complex Computer Models: March 2-3, 2007 The second workshop focused on topics from the SAMSI Program on Development, Assessment and Utilization of Complex Computer Models. Following an overview presentation by Tom Santner (Ohio State), Ariel Cintron-Arias (SAMSI

171 postdoc) and Peter Reichert (EAWAG) gave presentations on epidemiological models and aquatic ecosystems. Reichert and students taking the associated course then led the students through an extensive R tutorial, focused on the simulation of biogeochemical and ecological processes in lakes, for the remainder of the afternoon. Saturday’s presentations focused on biochemical network modeling, Darren Wilkinson (University of Newcastle), terrestrial models, Sean McMahon (Duke), and climate modeling, Cari Kaufman (SAMSI postdoc). As with the outreach workshop in November, members of the directorate and SAMSI postdocs met with the students during dinner on Friday to discuss graduate and career opportunities. Details regarding the workshop, including the presentations, can be obtained at the website http://www.samsi.info/workshops/2006ug-workshop200703.shtml. There were 25 student attendees which included 11females, 6 African Americans, and 1 Hispanic.

Undergraduate Workshop: May 22-26, 2006

The one-week SAMSI Workshop for Undergraduates focused on mathematical and statistical topics pertaining to inverse problems. During the initial sessions, students were introduced to physical applications involving structural, acoustic and thermal systems as well as the concepts of forward and inverse problems. Both mathematical and statistical models were derived for a prototypical system comprised of a vibrating beam, and significant attention was focused on the formulation and implementation of least squares relations to estimate material parameters given measured data. The tutorials included substantial exposure to MATLAB and routines for numerical integration and optimization. On the final day of the workshops, each student team presented the results they had obtained during the week. The Undergraduate Workshop encompasses three highly unique components. (i) All tutorials and sessions were presented by SAMSI graduate students and postdocs under close supervision of Smith, members of the Education and Outreach Committee, and local faculty. This allowed the undergraduates to interact with peers within educational and research programs they are considering and it provided valuable experience for the presenters, many of whom are considering academic careers. (ii) The workshop provided students with an intensive introduction to the synergy between applied mathematics and statistics within the context of timely physical applications. (iii) During one of the sessions, the students were introduced to a variety of experiments and each team collected their own data from the vibrating beam. This exposure to data collection illustrates both the physical basis for models and various mechanisms yielding uncertainty or noise. Where as a number of aspects were listed as highly positive in exit evaluations, the laboratory experience was one of the most highly ranked experiences. Full documentation regarding the presentations, tutorials, software, and student presentations can be found at the website http://www.ncsu.edu/crsc/events/ugw06/. There were 23 participants in the workshop from 18 institutions including 7 females, 1 African American, and 1 Hispanic.

172 Industrial Mathematical and Statistical Modeling Workshop: July 24 - August 1, 2006 The ten-day Industrial Mathematical and Statistical Modeling Workshop for Graduate Students was the 12th in the series and the 4th sponsored by SAMSI. The overall goals of the workshop are twofold: (i) expose mathematics and statistics students to current research problems from government laboratories and industry which have deterministic and stochastic components, and (ii) expose students to a team approach to problem solving. During the workshop, the students learn to communicate with scientists outside their discipline, allocate tasks among team members, and disseminate results through both oral presentations and written reports. For the 2006 workshop, 36 participants were chosen from 26 universities — of the 36, 18 were female and 1 was Hispanic. The attendees were divided into 6 teams to investigate current research problems presented by scientists from Advertising.com, Bank of America, Glaxo Smith Kline, Jet Propulsion Laboratory, Lord Corporation, and MIT Lincoln Laboratory. Each team gave a 30 minute oral presentation summarizing their results on the final day of the workshop and written reports were compiled as the SAMSI Technical Report 2006-6 which can be obtained at http://www.samsi.info/reports/index.shtml. Details regarding the workshop can be found at http://www.ncsu.edu/crsc/events/imsm06/.

Diversity:

See Section I.H for discussion of the efforts to promote diversity.

Courses:

See the program reviews in Section I.E for discussion of the SAMSI courses.

173 F. Industrial and Government Participation

Government and industry participation in SAMSI program and activities reflects broad interest in the SAMSI vision. The following summarizes participation during 2006-2007.

Development, Assessment and Utilization of Complex Computer Models: There were working group participants from the following companies and governmental labs or agencies: Bell Labs, Environmental Protection Agency, General Electric, General Motors, Los Alamos National Laboratory, National Center for Atmospheric Research, National Institute of Science and Technology, Pratt and Whitney, and SANDIA. One joint workshop was held with LANL, and two with NCAR.

High Dimensional Inference and Random Matrices: The program had an extensive collaboration with the National Center for Atmospheric Research, including multiple working group members and two joint workshops.

Multiplicity and Reproducibility in Scientific Studies: This summer program had working group members from Bristol-Myers Squibb Company, Eli Lilly and Company, the Federal Food and Drug Administration, CIIT Centers for Health Research, and the National Center for Environmental Health Statistics.

National Defense and Homeland Security: Individuals that participated long-term in the working groups were: Deepak Agarwal from AT&T Labs Research participated in Anomaly Detection, Lawrence Cox from National Center for Health Statistics participated in Anomaly Detection and co-led Data Confidentiality, Kevin Ward Drummey from U.S. Department of Defense participated in Anomaly Detection and Social Networks, Joe Fred Gonzalez, Jr. from National Center for Health Statistics participated in Anomaly Detection and Data Confidentiality, Myron Katzoff from National Center for Health Statistics participated in Anomaly Detection and Social Networks, Richard Picard from Los Alamos National Laboratory, Henry Rolka from National Center for Health Statistics from participated in Anomaly Detection, Clifford Wang from Army Research Office, and Abera Wouhib from National Center for Health Statistics. Mid-program workshops drew participants from the Bureau of Labor Statistics, Census Bureau, Energy Information Administration, National Center for Education Statistics, and National Center for Health Statistics

Astrostatistics: Because of the nature of the program, there was not industrial involvement. However, there was significant participation in the working groups from government agencies and laboratories such as NASA-Ames, NASA-Goddard, Smithsonian Astrophysical Observatory, Brookhaven National Laboratory and Fermi National Laboratory. Individuals from these and other such organizations also participated extensively in the workshops.

174 G. Publications and Technical Reports

I. DEVELOPMENT, ASSESSMENT AND UTILIZATION OF COMPLEX COMPUTER MODELS

Publications and Technical Reports

• Conti, S., O’Hagan,A. "Bayesian Emulation of Complex Multi-output and Dynamic Computer Models" Journal of Statistical Planning and Inference (2007) Research Report No. 569/07. Submitted

• Hacker, J.P., Anderson, J.L., Pagowski, M. “Improved Vertical Covariance Estimates for Ensemble-Filter Assimilation of Near-Surface Observations” Mon. Wea. Rev., 135, 10211036 (2007)

• Hacker, J.P., Rostkier-Edelstein, D. “PBL State Estimation with Surface Observations, a Column Model, and an Ensemble Filter” Mon. Wea. Rev., Accepted (2007)

• Li, J., Spiller, E.T., Biondini, G. “Noise-induced Perturbations of Dispersion- managed Solitons” Submitted to Physical Review A, January 2007

• Ong, K.L., Santner, T.J., Bartel, D.L. “Robust Design for Acetabular Cup Stability Accounting for Patient and Surgical Variability” Submitted. (2006)

• Williams, B.J., Santner, T.J., Notz, W.I., Lehman, J.S. “Sequential Design of Computer Experiments for Constrained Optimization” Submitted. (2006)

Reports in Preparation

• Cintron-Arias, A., Banks, H.T., Lloyd, A., Castillo-Chavez, C., Bettencourt, L. “Estimation of Seasonal Influenza Reproductive Numbers” In preparation. (2007)

• Cintron-Arias, A., Banks, H.T., Lloyd, A., Reichert, P. “Analysis of Oscillatory Patterns in Disease Transmission” In preparation. (2007)

• Cintron-Arias, A., Castillo-Chavez, C., Wang, X., Sanchez, F. “The Role of Nonlinear Relapse on Contagion Amongst Drinking Communities” In preparation. (2007)

• Han, G., Santner, T.J., Notz, W.I., Bartel, D.L. “Prediction for Computer Experiments Having Quantitative and Qualitative Input Variables” In preparation. (2007)

175 • Kaufman, C., Schervish, M., and Nychka, D. “Covariance Tapering for Likelihood Based Estimation in Large Spatial Datasets” (2007)

• Kaufman, C. and Sain, S. “Functional ANOVA Modeling of Regional Climate Model Experiments” In preparation. (2007)

• Kaufman, C. and Bingham, D. “Efficient Emulators of Computer Code Using Covariance Tapering” (2007)

• Li, J., Spiller, E.T., Biondini, G., Kath, W.L. “Symmetries, Conservation Laws, and Linearized Modes of the Dispersion-Managed Nonlinear Schroedinger Equation” In preparation (2007)

• Reichert, P., Mieleitner, J. “Analyzing Input and Structural Uncertainty of a Hydrological Model with Stochastic Time-Dependent Parameters” Draft. (2007)

• Shows, J., Fuentes, M., Bondell, H., Hacker, J.P., Tardif, R. “Stochastic Parameterization of WRF-1D” Manuscript in preparation. (2007)

• Spiller, E.T., Kath, W.L. “Rare Events in Phase Modulated Nonlinear Lightwave Systems” In preparation. (2007)

• Vernieres, G., Gremaud, P., Olufsen, M., DeVault, K., Cintron-Arias, A. “Validation of a Cerebral Blood Flow Model: The Example of the Circle of Willis” To be submitted to the Journal of Physiology (Summer 2007)

• Vernieres, G., Ide, K., Jones, C.K.R.T. “Lagrangian Data Assimilation in the Gulf of Mexico: A Proof of Concept in a Realistic Setting” To be submitted to the Journal of Geophysical Research (2007)

• Vernieres, G., Miller, R.N., Ehret, L. “On the Nonlinearity of the Kuroshio South of Japan” To be submitted to the Journal of Geophysical Research (2007)

• Vernieres, G., Miller, R.N., Ehret, L. “Assimilation of Dynamic Topography Data in a 2 Layer Quasigeostrophic Model of the Kuroshio South of Japan” To be submitted to the Journal of Ocean Modelling (2007)

• White, G. “Stochastic Neighborhood Conditional Autoregressive Prior for Spatial Data” (2007)

• White, G., Reichert, P., Bayarri, S., Santner, T., Pitman, B. “State-space Based Emulators for Computer Model Calibration and Validation” (2007)

II. HIGH DIMENSIONAL INFERENCE AND RANDOM MATRICES

Publications and Technical Reports

176

• Capobianco, E. “Sieving Genomic Features by Signal Separation and Interference Sparsification” SAMSI 2007-3

• Fan, J., Lv, J. “Sure Independence Screening for Ultra-High Dimensional Feature Space” (2006)

• Greenshtein, E., Park, J., Ritov, Y. "Estimating the Mean of High Valued Observations in High Dimensions" SAMSI 2006-7 Biometrika, Submitted (2006)

• Houdre, C., Litherland, T. “On the Longest Increasing Subsequence for Finite and Countable Alphabets” December 13, 2006 Preprint/Manuscript

• Ipsen, I., Nadler, B. “Refined Perturbation Bounds for Eigenvalues of Hermitian and Non-Hermitian Matrices” SAMSI 2007-2

• Mason, S.J., Galpin, J.S., Goddard, L., Graham, N.E., Rajaratnam, B. “Conditional Exceedance Probabilities” Monthly Weather Review (2007)

• Pal, J. "End-Point Estimation for Decreasing Densities: Asymptotic Behavior of the Penalized Likelihood Ratio" Submitted to Scandinavian Journal of Statistics. SAMSI 2007-1

• Pal, J. “Spiking Problem in Monotone Regression : Penalized Residual Sum of Squares” Submitted to Statistics and Probability Letters 2007

• Pal, J. “Density Estimation” Submitted to Anals of Statistics (2007)

• Pal, J. “Estimation of Smooth Link Functions in Monotone Response Models” Submitted to Journal of Statistical Planning and Inference 2007

• Reddy, C., Chiang, H.D., Rajaratnam, B. “Stability Region Based Expectation Maximization for Model-based Clustering” In proceedings of the IEEE/ACM International Conference on Data Mining (ICDM), Hong Kong, December 2006

• Reddy, C., Chiang, H.D., Rajaratnam, B. “TRUST-TECH Based Expectation Maximization for Learning Finite Mixture Models” Under second revision with IEEE Transactions on Pattern Analysis and Machine Intelligence. October 2006

• Talih, M. "Geodesic Markov Chains on Covariance Matrices" SAMSI 2007-4

Reports in Preparation

• Diciccio, T., Rajaratnam, B., Wells, M.T. “Marginal Likelihood Inference for Eigenvalues” In preparation (Spring 2007)

177 • Krishnapur, M., Vu, V. “Generalizations of Circular Law” In preparation. (2007)

• Krishnapur, M., Zeitouni, O., Rider, B., Virag, B. “Asymptotic Normality for Determinant Point Processes in Gaussian Unitary Ensemble” In preparation. (2007)

• Krishnapur, M., Virag, B., “Edge Universality via Tridiagonal Models by Markovian Dependence” In preparation. (2007)

• Massam, H., Paul D., Rajaratnam, B. “Model Search for Discrete Log-linear Models” should be completed by summer. (2007)

• Massam, H., Paul D., Rajaratnam, B. “Regularization for Likelihood Inference” To be submitted (Summer 2007)

• Pal, J., Wang, X., Walker, M. “Model-independent Estimates of Dark Matter Distributions” (2007)

• Pal, J., Meyer, M. “The Least Squares Regression Spline Decreasing Density Estimator” (2007)

• Pal, J., Richards, D. “Exact Inference for Monotone Incomplete Data” (2007)

• Pal, J., Talih, M. “Methods to Estimate Covariance Matrices in the Manifold of Positive Definite Matrices” (2007)

• Rajaratnam, B., Massam, H., Carvalho, C. “Flexible Covariance Estimation” to be submitted to the Annals of Statistics for the special SAMSI issue. (2007)

III. MULTIPLICITY AND REPRODUCIBILITY

Publications and Technical Reports

• Finner, H., Dickhaus, T., Roters, M. “On the False Discovery Rate and an Asymptotically Optimal Rejection Curve” Submitted (2007)

Reports in Preparation

• Faries, D., Leon, A, Haro, J.M., Obenchain, R. Analysis of Observational Health- Care Data Book in preparation.

• Rice, K. "Bayesian Decision Theory for Multiple Comparisons" (2006)

• Sivaganesan, S., Laud, P., Mueller, P. "Subgroup Analysis - a Bayesian Decision Theoretic Approach" (2006)

178 IV. NATIONAL DEFENSE AND HOMELAND SECURITY

Publications and Technical Reports

• Airoldi, E.M. “Bayesian Mixed-Membership Models of Complex and Evolving Networks” PhD. Thesis Carnegie Mellon University 2006

• Apte, A., Hairer, M., Stuart, A.M., Voss, J. "A Bayesian Approach to Data Assimilation" Physica D

• Bai, P., Banks, H. T., Dediu, S., Govan, A. Y., Last, M., Lloyd, A., Nguyen, H. K., Olufsen, M. S., Rempala, G., and Slenning, B. D. “Stochastic and Deterministic Models for Agricultural Production Networks” Mathematical Biosciences and Engineering. Submitted for publication. (2007)

• Banks, H.T., Karr, A.F., Nguyen, H.K. and Samuels, J.R., Jr. “Sensitivity To Noise Variance In A Social Network Dynamics Model” Quarterly of Applied. Mathematics (2005) To appear. Available on-line at www.samsi.info/reports/index.shtml

• Dediu, S., Banks, H.T., Nguyen, H.K. "Sensitivity of Dynamical Systems to Parameters in a Convex Subset of a Topological Vector Space" Siam Journal of Applied Mathematics. To appear in Mathematical Biosciences and Engineering. (2007)

• Dediu, S., Banks, H.T., Nguyen, H.K. "Time Delay Systems with Distribution Dependent Dynamics" To appear in IFAC Annual Reviews in Control, Plenary Paper, 6th IFAC Workshop (July 06)

• Dediu S., McLaughlin, J. “Recovering Inhomogeneities in a Waveguide using Eigensystem Decomposition” Inverse Problems, Vol 22, June 2006 pp. 1227- 1246

• Denogean, L., Karr, A.F., and Qaqish. B. “Model-based Utility of Doubly Random Swapping” NISS Tech Report (2006)

• Ghosh, J., Reiter, J.P., Karr, A.F. “Secure Computation with Horizontally Partitioned Data Using Adaptive Regression Splines” Computational Statistics and Data Anal. To appear. (2006)

• Karr, A., Feng, J., Lin, X., Sanil, A., and Young, S.S. “Secure Analysis of Distributed Chemical Databases without Data Integration” Journal of Computer Aided Molecular Design. In press. (2005)

• Karr, A.F., Fulp, W.J., Lin, X., Reiter, J.P., Vera, F., and Young, S. S. “Secure, Privacy-Preserving Analysis Of Distributed Databases” Technometrics (2006a) To appear.

179

• Karr, A.F., Lin, X., Reiter, J.P., Sanil, A.P. “Secure Analysis of Distributed Databases” In Olwell, D., Wilson, A.G., Wilson, G., editors, Statistical Methods in Counterterrorism: Game Theory, Modeling, Syndromic Surveillance, and Biometric Authentication, pages 237-261. Springer-Verlag, New York. (2006b)

• Lynch, J., Vera, F. "General Convex Stochastic Orderings and Related Martingale-Type Structures" Advances of Applied Probability (2006)

• Medhin, N. and Hong, C.C. “A Nonlinear Programming Approach to the Study of Social Networks” (2006a) Submitted to Neural Parallel and Scientific Computation.

• Medhin, N. and Hong, C.C. ”Positive And Negative Affinities Model For Social Networks” (2006b) Submitted to Neural Parallel and Scientific Computation.

• Oganian, A., Karr, A.F. “Combinations of SDC Methods for Microdata Protection” In Domingo-Ferrer, J. and Franconi, L., editors, Privacy in Statistical Databases 2006, volume 4302 of Lecture Notes in Comp. Sci., pages 102-113. Springer-Verlag, Berlin/Heidelberg

• Sanil, A.P., Karr, A.F., Lin, X., Reiter, J.P. “Privacy Preserving Analysis of Vertically Partitioned Data Using Secure Matrix Products” J. Official Statist. Submitted for publication. Available on-line at www.niss.org/dgii/technicalreports.html (2004)

• Vance, E. “Social Network Methodology” Journal of Organizational Computation (2007)

• Vance, E., Archie, E.A., Moss, C.J. “Social Networks in African Elephants” J. Computational and Mathematical Organization Theory. Submitted for publication.

• Woo, M.J., Reiter, J.P., Oganian, A., Karr, A.F. “Global Measures of Data Utility for Microdata Masked for Disclosure Limitation” J. Privacy and Confidentiality. Submitted for publication. Available on-line at www.niss.org/dgii/technicalreports.html. (2006)

Reports in Preparation

• Agarwal, D., McGregor, A., Phillips, J.M., Venkatasubramanian, S., and Zhu, Z. “Spatial Scan Statistics: Improved Approximations and a Performance Study” (2006) In preparation.

• Airoldi, E., Banks, D. L., and Xing, E. “Latent Space Mixture Models” (2006) In preparation.

180

• Dediu, S., Bai, P., Banks, H.T. "Stochastic and Deterministic Agricultural Network Model for Foot and Mouth Disease" In preparation. (2007)

• Denogean, L.R., Karr, A.F., Qaqish, B.F. “Model-Based Utility of Doubly Random Swapping” In preparation. (2007)

• Ghosh, J., a paper on the topic of flexibility in modeling to the secure computation toolkit helping to address the difficulty of agencies having to specify models without seeing others data is in progress. (2007)

• Karr, A.F., Oganian, A., Reiter, J., and Wu, M-J. “Combinations of SDC Methods” (2006) In preparation.

• Karr, A.F., Oganian, A., Reiter, J., and Wu, M-J. “New Measures of Data Utility” (2006) In preparation.

• Nettel-Aguirre, A. and Chipman, H. “Mining Transactional Data using Latent Space Social Network Models” (2006) In preparation.

• Qaqish, B.F., Denogean, L.R., Karr, A.F. “A Stochastic Process Approach to the Analysis of Swapping for Categorical Variables” In preparation.

V. ASTROSTATISTICS

Publications and Technical Reports

• Babu, G. J., Mahabal, A., Williams, R., and Djorgovski, S. G. (2007). “Object Detection in Multi-Epoch Data”. To appear in the proceedings of Astronomical Data Analysis IV.

• Clyde, M. A., Berger, J. O., Bullard, F., Ford, E. B., Jefferys, W. H., Luo, R., Paulo, R., and Loredo, T. (2007). “Current Challenges in Bayesian Model Choice”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Connors, Alanna and van Dyk, David A. (2007). “How to Win with Non- Gaussian Data: Poisson Goodness-of-Fit”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Ford, E. B., and Gregory, P. C. (2007). “Bayesian Model Selection and Extrasolar Planet Detection”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

181 • Ford, E. B., and Rasio, F.A. (2007). “Origins of Eccentric Extrasolar Planets: Testing the Planet-Planet Scattering Model”. Submitted to ApJ.

• Gregory, P. C. (2007). “A Bayesian Kepler Periodogram Detects a Second Planet in HD208487”. Monthly Notices of the Royal Astronomical Society, Volume 374, Issue 4, pp. 1321-1333. (MNRAS Homepage). Publication Date: 02/2007. SAMSI 2006-5

• Heinrich, J., and Lyons, L. (2007). “Systematic Errors”. Annual Reviews of Particle and Nuclear Physics, to appear.

• Jang, W., Hendry, M. (2007). “Cluster Analysis of Massive Datasets in Astronomy”. Submitted to Statistics and Computing.

• Jeffery, E. J., von Hippel, T., Jefferys, W. H., Winget, D. E., Stein, N., & DeGennaro, S. (2007). “New Techniques to Determine Ages of Open Clusters Using White Dwarfs”. ApJ, Volume 658, 391.

• Jefferys, W. H. (2007). “Current Challenges in Bayesian Model Choice: Comments”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Lee, H. (2006). “Two Topics: A Jackknife Maximum Likelihood Approach to Statistical Model Selection and a Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications”. Ph.D. Thesis, Penn State University.

• Loredo, T. J. (2007). “Analyzing Data From Astronomical Surveys: Issues and Directions”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Lyons, L. (2007). “A Particle Physicist’s Perspective on Astrophysics”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Maness, H. L.; Marcy, G. W.; Ford, E. B.; Hauschildt, P. H.; Shreve, A. T.; Basri, G. B.; Butler, R. P.; Vogt, S. S. (2007). “The M Dwarf GJ 436 and its Neptune- Mass Planet”. Publications of the Astronomical Society of the Pacific, Volume 119, 90-101.

• Park, Taeyoung, van Dyk, David A., and Siemiginowska, Aneta (2007). “A. Fitting Narrow Emission Lines in X-ray Spectra: Computation and Methods”. Under revision for the Astrophysical Journal.

• Roe, B. (2007). “Nuclear Instruments and Methods”. A570 p 159.

182 • Sen, B., Walker, M., and Woodroofe, M. “On the Unified Method with Nuisance Parameters” Michigan preprint, submitted for publication.

• van Dyk, David A., Park, Taeyoung, and Siemiginowska, Aneta (2007). “Fitting Narrow Spectral Lines in High-Energy Astrophysics Using Incompatible Gibbs Samplers”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. D. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• von Hippel, T., Jefferys, W. H., Scott, J., Stein, N., Winget, D. E., DeGennaro, S., Dam, A., & Jeffery, E. (2006), “Inverting Color-Magnitude Diagrams to Access Precise Star Cluster Parameters: A Bayesian Approach”. ApJ, 645, 1436.

Reports in Preparation

• Baines, P. “Upper Limits for Source Detection in the Three-Poisson Model” To be presented at a meeting in August 2007

• Bullard, F. “Improving the Efficiency of Scheduling Radial Velocity Measurements for Exoplanet Detection Using Bayes and a Fast Integral Estimator” Thesis in progress.

• Chernoff, L., Clyde, M., Berger, J. and Bullard F. “Bayesian Methods for Exoplanet Radial Velocity Data: The Kepler Periodogram and Evolutionary Markov Chain Monte Carlo” (2007)

• Davison, A.C., and Sartori, N. “Likelihood Inference for a Problem in Particle Physics” (2007)

• Demortier , L. “P-Values: What They Are and How to Use Them” CDF note 8662 (2007)

• Edlefsen, P. “A Dempster-Shafer Bayesian Solution to the Banff A1 Challenge” To be presented at a meeting in August 2007

• Jin, Z., Linneman, J., and Reid, N. “A Note on Measures of Significance in HEP and Astrophysics: Some Higher Order Approximations” (2007)

• Lee, T. Young, A., Kashyap, V., and van Dyk, D. “Detection and Classification of Sunspots Groups Captured in Magentograms” (2007)

• van Dyk, D., Kashyap, V., Siemiginowska, A., and Zezas, A. “Upper Limits, Detection Limits, and Confidence Intervals” (2007)

• Willams, L., and Zhu, Z. “Reconstruction of the Galaxy Cluster Mass Distribution Using Gravitational Lensing and Semi-Parametric Spatial Mixed Effects Model” (2007)

183

• Yu, Y., Kashyap, V., and van Dyk, D. “Statistical Modeling of Sunspot Cycles” (2007)

• The Exoplanets working group is planning on two articles:

1. Berger,Jim, Floyd Bullard, Merlise Clyde, Eric Ford, Phil Gregory, Bill Jefferys and others (potential authors) “Stochastic Computation of Bayes Factors for Model Selection for Exoplanets”

2. Bullard, Floyd and Merlise Clyde “Nested Importance Sampling”

• The Source and Feature Detection working group expect to prepare a technical report “Statistical Methods for Computing Upper Limits for the Intensity of Weak Astronomical Sources” In preparation.

X. FINANCIAL MATHEMATICS, STATISTICS AND ECONOMETRICS

Publications and Technical Reports

• Anderson, E., E. Ghysels and J. Juergens “The Impact Of Risk And Uncertainty On Expected Return” Discussion paper ASU and UNC.

• Andreou, E. and B. Werker “An Alternative Asymptotic Analysis of Residual- Based Statistics” Discussion paper Tilburg University.

• Boes, M.-J., F.C. Drost and Bas Werker “Nonparametric Risk-Neutral Return And Volatility Distributions” Discussion paper Tilburg University.

• Boes, M.-J., F.C. Drost and Bas Werker “The Impact of Overnight Periods on Option Pricing'' Journal of Financial and Quantitative Analysis, to appear.

• Chabi-Yo, F., E. Ghysels and E. Renault “Disentangling the Effect of Heterogeneous Beliefs and Preferences on Asset Prices” Discussion paper UNC.

• Chen, X. and E. Ghysels “Intra-day News Impact Curves and Realized Volatility” Discussion paper UNC.

• Engle, R., E. Ghysels and B. Sohn “On the Economic Sources of Stock Market Volatility” Discussion paper NYU and UNC.

• Fouque, JP, Sircar, R., and Solna, K. ”Stochastic Volatility Effects on Defaultable Bonds” Applied Mathematical Finance. To appear 2006.

• Fouque, JP and Zhou, X. “Modeling Correlated Defaults: First Passage Model under Stochastic Volatility” To be submitted (April 2006)

184

• Horst, E. Thesis “A Lévy Generalization of Compound Poisson Processes in Finance: Theory and Applications” Presented at the 6th World Congress of the Bernouilli Society for Mathematical Statistics and Probability, 67th Annual Meeting of the Institute of Mathematical Statistics. Barcelona, Spain. (Jul-Aug 2004)

• Koijen, R.S.J., T. Nijman and B. Werker, “C” Labor Income and the Demand for Long-Term Bonds” Discussion paper Tilburg University.

• Pang, T., Pemy, M. and Chang, M-H. “Optimal Stopping for Stochastic Functional Differential Equations” SIAM Journal of Optimization and Control, submitted.

• Pang, T., Pemy, M. and Chang, M-H. “Optimal Control of Functional Stochastic Differential Equations with Bounded Memory” International Journal of Probability and Stochastic Processes, submitted.

• Pang, T., Pemy, M. and Chang, M-H. “Viscosity Solutions of Infinite Dimensional Black-Scholes Equation and Numerical Approximations” Submitted.

• Pemy, M., Chang, C.H., Pang, T. "Finite Difference Approximations for Stochastic Control System with Delay" Submitted (2007)

• Pemy, M., Yin, G. and Zhang, Q. “Liquidation of a Large Block of Stock” To appear in the Journal of Banking and Finance (December 2006)

• Pemy, M., Chang, C.H., Pang, T. "Numerical Methods for Stochastic Optimal Stopping with Delay" Submitted. (2007)

• Pemy, M., Zhang, Q., Yin, G. "Selling a Large Stock Position: A Stochastic Control Approach with State Constraints" Submitted (2007)

• Pemy, M., Zhang, Q. “Optimal Stock Liquidation in a Regime Switching Model with Finite Time Horizon” Journal of Mathematical Analysis and Application, 312, (2006) 537-552

• Renault, E. and B. Werker “Causality Effects in Return Volatility Measures with Random Times” Discussion paper Tilburg University.

Reports in Preparation

• Fouque, JP and Zhou, X. “Perturbed Gaussian Copulas” In preparation.

• Fouque, JP and Rodriguez, J. “Singular Perturbations for SPDEÕs” In preparation.

185

• Lenczner, M., Pemy, M. “Controlled Cellular Neural Network” In preparation.

• Sinko, A. Working on three papers on MIDAS regressions, and how it relates to volatility modeling. (2007)

XVI. BOOKS AND MISC. PUBLICATIONS

• Chertok, A., Haider, M.A., Olufsen, M., Smith, R. Twelfth Industrial Mathematical and Statistical Modeling Workshop for Graduate Students, SAMSI 2006-6, Aug 06

• Fouque, JP, Papanicolaou, G., Sircar, R., and Solna, K. The following book was written in part during the FMSE program: Volatility Perturbations in Financial Market, Cambridge University Press. In preparation.

• Kang, W. and A.J. Krener Control Bifurcations The notes were developed in part while Krener taught the course MA 797R as detailed in Section 3.13. These notes were also typeset during the program.

186 H. Efforts to Achieve Diversity

SAMSI puts considerable emphasis on achieving diversity. During the past year, we have co-sponsored, with the Department of Mathematics at UNC, Chapel Hill, the Conference for African-American Mathematical Scientists (held in Chapel Hill and at SAMSI in June, 2006.)

An opportunity arose during that meeting to collaborate with a group of African- American women who have started the Infinite Possibilities Conference. This is a conference for minority women who are aiming to build careers in the mathematical sciences. The next one will be held in Raleigh, NC in November, 2007 and will be co- sponsored and co-organized by SAMSI. In 2008, the Blackwell-Tapia Conference will be held at SAMSI.

An institutes’ diversity coordination committee was formed in 2006 by Chris Jones (SAMSI) and Helen Moore (formerly of AIM). It is now chaired by Jones. This committee coordinates diversity initiatives and activities across the mathematical sciences institutes in the US and Canada. In particular, it will be organizing a reception at the 2007 SACNAS meeting.

Under the auspices of the diversity committee and partly sponsored by SAMSI and MSRI, a dinner was held in honor of David Blackwell and Bill Massey (2006 recipient of the Blackwell-Tapia prize). The dinner was in Berkeley, CA on March 28, 2007. Thirty people attended, including over twenty African-Americans who credited Blackwell and Massey among their mentors and sources of inspiration.

Development, Assessment and Utilization of Complex Computer Models: Minorities, in particular women, are very well represented. Susie Bayarri was the overall program leader and Montse Fuentes was heavily involved in the leadership of the Environmental Subprogram. The Program Leaders strived to have women representation in all working groups, workshops and activities. In particular, when invitations have been issued (for participation in different activities), adequate representation (senior/junior, geographical and minorities) have been actively pursued. For instance, speakers at workshops were 20- 25\% female, even though Computer Models is an area where extremely few senior women are active. Also, five of the nine official graduate students in the program were female and one was Hispanic.

Random Matrices and High Dimensional Inference: Among the program leaders was one woman: Helene Massam and a number of women were very active in the program. In particular, Nanny Wermuth (Chalmers University) organized the Graphical Models workshop. In the opening workshop, 25 out of 150 participants were women and 35 were new researchers. Among the speakers were at least one in each main session being female. In total, there were five women among the main speakers.

187 Donald Richards, who is African-American, was very active in almost every aspect of the program. He visited SAMSI for the entire Fall Semester, led the Multivariate Distributions working group and was an organizer of the Transition Workshop at AIM. In addition, Leonard Choup, also African-American, was active in the Berkeley node and a participant at the AIM workshop.

Summer Program on Multiplicity and Reproducibility in Scientific Studies: Juliet Shaffer was one of the program leaders and Rochelle Tractenberg assumed a follow-up leadership role in the post-summer activities of the Subgroup Analysis Working Group. Almost 25% of the speakers in the opening workshop were female, and between 25\% and 33% of the participants in the working groups were female.

Conference for African-American Researchers in the Mathematical Sciences: The 12th annual CAARMS was held in Chapel Hill, June 20-23, 2006. It attracted more than 70 African-American researchers in the mathematical sciences and was judged as a huge success by the participants. Talks included those given by: Arlie Petters (Duke), Kim Weems (NCState), Rudy Horne (Florida State), Dominic Clemence (NC A&T), Charles Hagwood (NIST), Ethlebert Chukwu (NCState), Jeffrey Forbes (Duke), Gelonia Dent (NC A&T), Farrah Jackson Chandler (UNC-Wilmington) and Otis Jennings (Duke). There was a poster session held on the second evening, featuring posters by grad students and postdocs with a competition for “best poster.” The banquet on Thursday evening featured an after-dinner speech by Johnny Houston (Elizabeth City State University). A tutorial was given by Chris Jones as well as an introduction to SAMSI and its programs. The conference was co-organized with the Department of Mathematics at UNC- Chapel Hill and most of the events were held on campus. The participants all came out to SAMSI on Friday afternoon and held a lunch followed by an extended discussion on issues of interest to the community, including the possibility of future initiatives.

Education and Outreach Program: SAMSI continues to use its E&O Program to enhance its diversity efforts by active recruitment of under-represented participants. We are actively recruiting from HBCU's for all programs and are continuing to augment the recruitment of Hispanics and Native Americans through the assistance of members of the National Advisory and Education and Outreach Committees. The diversity breakdown in specific E&O workshops is as follows. • Undergraduate Workshop (May 2006): From the 23 participants, 7 were female, 1 was African American, and 1 was Hispanic. • Industrial Mathematical and Statistical Modeling (IMSM) Workshop (July 2006): Out of 36 participants, 18 were female and 1 was Hispanic. • 2-Day Undergraduate Workshop (November 2006): From the 22 participants, 10 were female and 7 were African American. • 2-Day Undergraduate Workshop (March, 2007): Out of 25 participants, 11 were female, 6 were African American and 1 was Hispanic.

188 National Defense and Homeland Security: Women and new researchers were well represented throughout the program. One of the program leaders is a woman. Five new researchers and five females presented at the Opening Workshop. At the Opening Workshop there were 25 female, five African- American, four Hispanic, and 33 New Researcher participants were in attendance. In the Agricultural Systems working group the leader is a woman. The Agricultural Systems working group also had one regular female participant, one female postdoc, and one female student. The Anomaly Detection working group had two active female participants, one female postdoc and one female graduate student. The Data Confidentiality working group had three female postdocs and two female students. The Social Networks working group has one female postdoc and one female student. At the Mid-Year Meeting on Anomaly Detection there were six female, two Hispanic, and 10 New Researcher participants in attendance. At the Mid-Year Meeting on Social Networks there were five female, one Hispanic, and 11 New Researcher participants were in attendance. At the Mid-Year Meeting on Data Confidentiality there were eight female, three Hispanic, and eight New Researcher participants were in attendance

Astrostatistics: Three African American researchers were central to the program. Harrison Prosper was a major participant in the Statistical Issues in Particle Physics intensive session, Arlie Petters was the leader of Gravitational Lensin} working group. Participation of women was also extensive. Merlise Clyde was the co-leader of the Exoplanets working group. Susie Bayarri and Barbara McArthur (who gave a keynote presentation at the kickoff meeting) also participated in this group. Ruth Stella Barrera Rojas and Alanna Connors -- who was also on the overall program leaders committee -- participated via teleconference in Surveys and Population studies and Source and Feature Detection working groups. A graduate student Hyunsook Lee participated in all the working groups. In addition, Rebecca Willett, Aneta Siemiginowska, and Ramani Pilla participated in the Source and Feature Detection working group; and Pushpa Bhat participated in the Statistical Issues in Particle Physics intensive session; and Elizabeth Jeffery participated in the Stellar Evolution intensive session. Two other women, Megan Sosey and Fabrizia Guglielmetti were involved in the planning meeting for the program.

Workshops: There were, of course, numerous workshop participants from underrepresented groups, as indicated in the following table. Also listed are the numbers of new researchers at each of the workshops.

189 2005-06 Programs Underrepresented Groups # New Program # # African- # Activity # Participants Resrcher- Year Female American Hispanic Students

Astrostatistics Program

Astrostatistics Transition Workshop (in conjunction with SCMA VI at Penn 2005-06 104 17 1 0 45 State) -- June 12-15, 2006

Astrostatistics Transition Workshop (In conjunction with PHYSTAT at 2006-07 33 5 0 0 3 BIRS) -- July 15-20, 2006

Education and Outreach Program

2005-06 SAMSI-CRSC Undergraduate Workshop -- May 22-26, 2006 23 8 1 1 23

12th Annual Conference for African American Researchers in 2005-06 68 19 63 0 30 Mathematical Sciences (CAARMS) -- June 20-23, 2006

2006-07 Programs

Underrepresented Groups # New Program # # African- # Activity # Participants Resrcher- Year Female American Hispanic Students

Development, Assessment and Utilization of Complex Computer Models

Summer School on the Design and Analysis of Computer Experiments (at 2006-07 57 12 0 1 28 IRMACS, Simon Fraser U) -- August 11-16, 2006

Development, Assessment and Utilization of Complex Computer Models 2006-07 124 25 2 2 28 (CompMod) Opening Workshop & Tutorials -- September 10-13, 2006

CompMod Joint Engineering and Methodology Subprogram Workshop -- 2006-07 22 6 0 1 8 October 26-27, 2006

2006-07 CompMod Biosystems Modeling Workshop -- March 5-7, 2007 62 14 0 0 31

2006-07 CompMod Joint SAMSI/MUCM Mid-Program Workshop -- April 2-3, 2007 29 4 0 1 10

2006-07 CompMod Terrestrial Mid-Program Workshop -- April 4, 2007 16 5 0 0 10

Development, Assessment and Utilization of Complex Computer Models 2006-07 to be reported in the 2007-08 Annual Report (CompMod) Transition Workshop (at NCAR) -- May 14-16, 2007

190 CompMod One-Day Workshop on Calibration of Computational Models of 2006-07 to be reported in the 2007-08 Annual Report Cerebral Blood Flow -- May 17, 2007

High Dimesional Inference and Random Matrices

High Dimensional Inference and Random Matrices (HDIRM) Opening 2006-07 146 25 1 0 35 Workshop & Tutorials -- September 17-20, 2006

Random Matrices Program Bayesian Focus Week -- October 30- 2006-07 49 9 0 0 22 November 3, 2006

Large Graphical Models and Random Matrices Workshop -- November 9- 2006-07 23 8 1 0 9 11, 2006

Workshop on Geometry, Random Matrices and Statistical Inference -- 2006-07 34 6 0 0 17 January 16-19, 2007

High Dimensional Inference and Random Matrices Transition Workshop 2006-07 to be reported in the 2007-08 Annual Report (at AIM, Palo Alto, CA) -- April 10-13, 2007

Summer Program on Multiplicity and Reproducibility

Multiplicity and Reproducibility in Scientific Studies Opening Workshop -- 2006-07 60 20 0 3 12 July 10-12, 2006

Multiplicity and Reproducibility in Scientific Studies Closing Workshop -- 2006-07 Informal Meeting July 27-28, 2006

Summer Program on Adaptive Treatment Design

2006-07 Summer Program on Adaptive Treatment Design -- June 18-29, 2007 to be reported in the 2007-08 Annual Report

2006-07 Education and Outreach

SAMSI-CRSC Industrial Mathematical & Statistical Modeling Workshop 2006-07 38 20 0 1 36 for Graduates -- July 24-August 1, 2005

2006-07 Undergraduate Two-Day Workshop -- November 17-18, 2007 31 12 8 0 21

2006-07 Undergraduate Two-Day Workshop -- March 2-3, 2007 31 13 6 1 25

SAMSI/CRSC Interdisciplinary Workshop for Undergraduates -- May 21- 2006-07 to be reported in the 2007-08 Annual Report 25, 2007

191

Co-sponsored and Informal Meetings and Workshops

T-O-Y 2007 Workshop on Geophysical Models (at NCAR) -- November 2006-07 46 9 0 0 13 13-14, 2006

Dynamics of Infectious Diseases One-day Working Group Meeting -- 2006-07 Informal Meeting February 16, 2007

Dynamics of Infectious Diseases One-day Working Group Meeting -- 2006-07 Informal Meeting March 16, 2007

Dynamics of Infectious Diseases One-day Working Group Meeting -- April 2006-07 to be reported in the 2007-08 Annual Report 13, 2007

Dynamics of Infectious Diseases Three-day Working Group Meeting (at 2006-07 to be reported in the 2007-08 Annual Report VA Tech) -- May 3-5, 2007

T-O-Y 2007 Workshop on the Application of Random Matrices Theory 2006-07 to be reported in the 2007-08 Annual Report and Methods -- May 7-9, 2007

T-O-Y 2007 Workshop on the Application of Statistics to Numerical 2006-07 to be reported in the 2007-08 Annual Report Models: New Methods and Case Studies (at NCAR) -- May 21-24, 2007

Upcoming 2007-08 Meetings and Workshops

Summer Program on the Geometry and Statistics of Shape Spaces -- July 2007-08 to be reported in the 2007-08 Annual Report 7-13, 2007

SAMSI/CRSC Industrial Mathematical & Statistical Workshop for 2007-08 to be reported in the 2007-08 Annual Report Graduate Students -- July 23-31, 2007

Risk Analysis, Extreme Events and Decision Theory Opening Workshop - 2007-08 to be reported in the 2007-08 Annual Report - September 16-19, 2007

2007-08 Random Media Opening Workshop -- September 23-26, 2007 to be reported in the 2007-08 Annual Report

192 I. External Support and Affiliates

1. External Support

Astrostatistics: The transition workshop was jointly supported by the Center for Astrostatistics at Penn State University. The funding for the BIRS/SAMSI workshop was mainly provided by BIRS.

Development, Assessment and Utilization of Complex Computer Models: The Summer School on the Design and Analysis of Computer Experiments (8/11/06-8/16/06 at Simon Fraser U.) was cosponsored with the Canadian National Program on Complex Data Structures. Two workshops were cosponsored by the National Center for Atmospheric Research. NCAR also provided half the support for one postdoctoral fellow (Cari Kaufman). The Managing Uncertainty in Complex Systems program in the UK co- funded one joint workshop and provided funding for numerous of its members to visit SAMSI.

High Dimensional Inference and Random Matrices: The program had one joint workshop at NCAR and one at the American Institute of Mathematics.

Affiliates: Significant support arose from the Affiliates, as discussed in the next section.

2. Affiliate Involvement

2.1. Background

The NISS Affiliates Program and NISS/SAMSI University Affiliates Program are the largest programs of their kind among the DMS-funded mathematical sciences research institutes. NISS director Alan Karr, associate director Nell Sedransk and assistant director Stanley Young have major responsibility for operation of these programs.

As a benefit of membership, NISS Affiliates and NISS/SAMSI University Affiliates may receive reimbursement for expenses to attend SAMSI workshops as well as NISS events. Through meetings and other activities, the NISS Affiliates and NISS/SAMSI University Affiliates inform the development of SAMSI programs. To illustrate, the 2006–07 program on Development, Assessment and Utilization of Complex Computer Models, as well as the National Defense and Homeland Security program in 2005–06, the Latent Variable Models in the Social Sciences (LVSS) program in 2004–05 and the DMML program for 2003–04, all reflect affiliate interest to a significant degree. The upcoming 2007–08 program on Risk Analysis, Extreme Events and Decision Theory also responds to strong affiliate interest.

193 NISS Affiliates and NISS/SAMSI Affiliates are listed below:

Corporations: Avaya Labs, Basking Ridge, NJ; Aventis Pharmaceuticals, Bridgewater, NJ; Bell Labs–Alcatel/Lucent, Murray Hill, NJ; General Motors, Detroit, MI; Glaxo Smith Kline, Research Triangle Park, NC and Collegeville, PA; Ell Lilly, Indianapolis, IN; Merck Research Laboratories, West Point, PA; Metabolon, Inc., Research Triangle Park, NC; Meta Metrics, Inc., Durham, NC; RTI International, Research Triangle Park, NC; SAS Institute, Cary, NC; SPSS, Chicago, IL; Wyeth, Collegeville, PA; Xerox Innovation Group, Webster, NY Government Agencies and National Laboratories: Bureau of the Census, Washington, DC; Bureau of Labor Statistics, Washington, DC; Energy Information Administration, Washington, DC; Los Alamos National Laboratory, Los Alamos, NM; National Agricultural Statistics Service, Fairfax, VA; National Cancer Institute, Bethesda, MD; National Center for Education Statistics, Washington, DC; National Center for Health Statistics, Hyattsville, MD; National Security Agency, Ft. George W. Meade, MD

NISS/SAMSI University Affiliates: University of California Berkeley, Department of Statistics; Carnegie Mellon University, Department of Statistics; University of Connecticut, Department of Statistics; Duke University, Institute of Statistics and Decision Sciences and Department of Mathematics; Emory University, Department of Biostatistics; , Department of Statistics; Florida State University, Department of Statistics; George Mason University; University of Georgia, Department of Statistics; University of Illinois Urbana– Champaign, Department of Statistics; University of Iowa, Department of Statistics; Iowa State University, Department of Statistics; Johns Hopkins University, Department of Applied Mathematics and Statistics; University of Michigan, Departments of Statistics and Biostatistics; University of Missouri Columbia, Department of Statistics; North Carolina State University, Department of Statistics; North Carolina State University, Department of Mathematics; University of North Carolina at Chapel Hill, Department of Statistics and Operations Research; University of North Carolina at Chapel Hill, Department of Biostatistics; University of North Carolina at Chapel Hill, Department of Mathematics; Oakland University, Department of Mathematics and Statistics; Ohio State University, Department of Statistics; Pennsylvania State University, Department of Statistics; Purdue University, Department of Statistics; Rice University, Department of Statistics; Rutgers University, Department of Statistics; University of South Carolina, Department of Statistics; Southern Methodist University, Statistical Science Department; Stanford University, Department of Statistics; Texas A&M University, Department of Statistics; Virginia Commonwealth University, Biostatistics

2.2 Affiliate Participation

Every SAMSI program and event during 2005–06 had strong Affiliate participation, nearing one-half of attendees at some workshops.

Participation by affiliates in SAMSI programs remains extremely strong. Examples include:

194

Astrostatistics: There were working group participants from each of the following university affiliates: University of California Berkeley, Carnegie Mellon University, Duke University, University of Georgia, University of Michigan, University of North Carolina at Chapel Hill, Pennsylvania State University, and Purdue University.

Computer Models: There were working group participants from the following industrial and governmental affiliates: Bell Labs, General Motors, LANL, and NIST. There were working group participants from the following university affiliates: Duke University, University of Georgia, Iowa State University, University of North Carolina at Chapel Hill, North Carolina State University, Oakland University and Ohio State University.

High-Dimensional Inference and Random Matrices: Working group participants came from each of the following university affiliates: the University of California Berkeley, Duke University, the University of Florida, the University of Georgia, the University of Michigan, the University of Missouri, North Carolina State University, the University of North Carolina, Penn State University and Stanford University.

The affiliates program co-sponsored a “Workshop on Modifying Surveys in Response to Disruptions,” held in at the Bureau of Labor Statistics (BLS) Washington, DC on March 15–16, 2007; the American Statistical Association (ASA) and BLS also were sponsors. This workshop, whose roots derive from the 2003–04 LVSS program, drew more than 50 attendees from nearly a dozen affiliates.

2.3 Plans for the Future

There is affiliate interest in all SAMSI programs, but interest is especially strong in • The 2006–07 intensive summer program on Dynamic Treatment Regimes and Multistage Decision-Making, two of whose leaders are from affiliates. • The 2007–08 program on Risk Analysis, Extreme Events and Decision Theory, which also has two program leaders from affiliates.

195 J. Advisory Committees

Committee Name Affiliation Field Term Governing Board Bruce Carney UNC, Assoc. Dean Astronomy George Casella U of Florida (ASA Rep) Statistics U of Colorado (SIAM Tom Manteuffel Applied Mathematics Rep) Vijay Nair NISS Trustees Chair Statistics John Simon Duke, Asst. Provost Chemistry Daniel Solomon (Chair) NCSU, Dean Statistics National Advisory Mary Ellen Bock (Co-Chair) Purdue U Statistics 2002-2007 Committee Lawrence Brown U of Pennsylvania Statistics 2002-2007 Raymond Carroll Texas A&M U Statistics 2005-2007 Carlos Castillo-Chavez (Co- Arizona State U Mathematics 2003-2006 Chair) Rick Durrett Cornell U Mathematics 2006-2008 Nancy Kopell Boston U Mathematics 2006-2008 Rod Little U of Michigan Biostatistics 2006-2008 David Mumford Brown U Applied Mathematics 2006-2008 Daryl Pregibon Google, Inc CS and Statistics 2003-2006 G.W. Stewart U of Maryland Computer Science 2003-2006 Mary Wheeler U of Texas Mathematics 2005-2007 Bin Yu U of CA, Berkeley Statistics 2006-2008 Local Development David Banks Duke Statistics Committee H.T. Banks NCSU Mathematics Lloyd Edwards UNC Biostatistics Gregory Forest UNC Mathematics Montserrat Fuentes NCSU Statistics John Harer Duke Mathematics Sharon Lubkin NCSU Mathematics Richard Smith UNC Statistics Butch Tsiatis NCSU Statistics Mike West Duke Bioinfomatics & Stats Chairs Committee Patrick Eberlein UNC Mathematics Loek Helmnick NCSU Mathematics Thomas Kepler Duke Biostatistics Michael Kosorok UNC Biostatistics Vidyadhar Kulkarni UNC Statistics Sastry Pantula NCSU Statistics Dalene Stangl Duke Statistics Mark Stern Duke Mathematics Education and Negash Begashaw Benedict College Mathematical Sciences Outreach Carlos Castillo-Chavez (ex Committee Arizona State U Mathematics officio) Karen Chiswell NCSU Statistics Cammey Cole Meredith College Mathematics & CS Mathematics & Wei Feng UNC-Wilmington Statistics Marian Hukle U of Kansas Biological Sciences Negash Medhin NCSU Mathematics Masilamani Sambandham Morehouse College Mathematics Ralph Smith (chair, ex officio) NCSU Mathematics II Special Reports: Program Plan

A. Programs for 2007-2008

1. Risk Analysis, Extreme Events and Decision Theory

1.1 Summary

Over the past several years, there has been a wealth of scientific progress on risk analysis. As the set of underlying problems has become increasingly diverse, drawing from areas ranging from national defense and homeland security to genetically modified organisms to animal disease epidemics and public health to critical infrastructure, much research has become narrowly focused on a single area. It has also become clear, however, that the need is urgent and compelling for research on risk analysis, extreme events (such as major hurricanes) and decision theory in a broader context. Availability of past information, expert opinion, complex system models, and financial or other cost implications as well as the space of possible decisions may be used to characterize the risks in different settings. Integration of expertise developed by researchers in different scientific communities on each of these facets is the objective of this SAMSI program. Risk analysis and extreme events also carry a significant public policy component, which is driven in part by the increasing stakes and the multiplicity of stakeholders. In particular, policy concerns direct attention not only to the dramatic risks for huge numbers of people associated, for example, with events of the magnitude of Hurricane Katrina or bioterrorism, but also to "small-scale" risks such as drug interactions driven by rare combinations of genetic factors.

1.2 Research Foci

The aim of this full-year program is to address fundamental issues in risk analysis and the linked problems associated with extreme events and decision theory. By engaging researchers from the statistical sciences, applied mathematical sciences including actuarial science, and the decision sciences, including operations research, the goal is to set research agendas that can produce genuine impact on the practice of risk analysis and assessment as well as on theory and methodology for extreme events and decision theory. Interdisciplinary working groups are relatively certain to be formed around both kinds of events and critical research tasks in theory and methodology, following the already identified interests and the existing momentum. Critical research tasks for this program include theoretical development of extreme value theory, implementations of methodologies that integrate expert opinion with data and with models, risk assessment and prediction with applications to high- impact events.

1.2.1 Technical Problems for Statistical / Mathematical Research

(1) Extreme-value theory for multidimensional extremes. Probability theory and statistical methodology for one-dimensional distributions of extreme values have been developed over the past half-century, but these do not extend easily to higher dimensions. The contexts of extreme

201 events such as natural disasters or man-created events of great magnitude and rarity are both complex and multi-factor in their origin and also high-dimensional in their consequences. Thus the development of a usable probability theory and accompanying statistical methodology is crucial if statistical thinking is to be applied to these kinds of events; α-stable processes form one focus for theoretical research.

(2) Cost prediction. Prediction of financial consequences of extreme events is formulated differently in actuarial science, in operations research, in financial mathematics and in risk management and decision sciences. In the example of pricing structures, the mathematical and statistical models and the embedding of data are different in the insurance (actuarial), the industrial (operations research) and investment (financial mathematics) industries. In particular, the incorporation of data in the form of "estimated" costs may be mathematically formalized or not. To a significant extent, costs are not treated as random; and there is no role for uncertainty associate with the costs themselves or with the process of their estimation. Like other applications where second moments are neglected, the consequences for decision-making can be severe. Serious attention to inclusion of uncertainties in the modeling may offer significant improvement in risk analyses.

(3) Expert opinion. Data inadequacies were perhaps the most clearly identified theme at the Iowa State workshop. For some rare events whose risk must be assessed, there are no data; more often there are data of mixed degrees of relevance and reliance on experts' opinions is needed to avoid rigid specifications of parameters and/or functional forms within risk models that cannot be documented. Various Bayesian methodological techniques can be implemented using prior elicitation and models incorporating expert opinion to produce accurate estimates of ecological risk factor, e.g., ecological risk factors. Examples include toxic waste-contaminated sites, military training and testing activities, wastewater treatment risks, and patho-biological risks.

(4) Multi-dimensional decision analysis. In many contexts (e.g., Hurricane Katrina) decisions to mitigate future risks are made by multiple stakeholders (e.g., federal, state and local governments, insurance companies, local businesses) but methods do not exist to integrate these decisions. Existing decision theoretic abstractions and tools may not be adequate to deal with these problems. Since these risks also exist on multiple time and space scales, the difficulties are further complicated by the need to address multi-scale risks.

(5) Models for exposure and threat propagation. Exposure assessment depends critically on the particular models employed as well as the quality of the input data and/or parameter estimates. The same is true in the example of the propagation of an epidemic, and even more certainly for the extension of epidemic models to bio-terrorism. A weakness in many risk analysis arises from the large unobserved variability. In some settings, such as environmental risk, the problem is well-recognized (but not solved). Even with current exposure estimates, it is not possible to identify at-risk individuals with extreme exposure.

1.2.2 Contextual Settings for Risk Assessment and Decisions

(1) Ecological risk assessment. The primary objectives are to measure concentrations of biohazard and bioavailable contaminants in ambient media, to evaluate food chain transfers and,

202 ultimately, to predict ecological risk. Such data can be obtained directly from plants and animals. Often, however, direct field measurements are not possible to obtain, in which case efficient spatial sampling approaches are needed.

(2) Risk in the pharmaceutical industry. Issues raised at the Iowa State workshop include the increasing importance of non-randomized clinical trials, multiplicity and asymmetries between null and alternative hypotheses.

1.3 Organization and Program Leadership

The program leaders are Dipak Dey (Univ. of Connecticut), Stephen Pollock (Michigan), David Rios (Universidad Rey Juan Carlos), Lawrence Brown (Univ. of Pennsylvania, National Advisory Committee Liaison), Richard Smith (UNC-CH, Local Scientific Coordinator), Nell Sedransk (SAMSI Directorate Liaison).

The following Scientific Committee provides advice as needed on specific components: David Banks (Duke), Vickie Bier (Univ. of Wisconsin), James Broffitt (Univ. of Iowa), Alicia Carriquiry (Iowa State), Robert Clemen (Duke), Susan Ellenberg (Univ. of Pennsylvania), Herbert Hethcote (Univ. of Iowa), Wolfgang Kliemann (Iowa State), Robert Winkler (Duke), Stan Young (NISS).

1.4 Program Development

The organization of a SAMSI program requires considerable pre-program planning. On October 27-29, 2005, the National Institute of Statistical Sciences and Iowa State University co- sponsored a Workshop on Overarching Issues in Risk Analysis, held in Ames, IA. The workshop was meant to be a "stock-taking" of exciting new research on risk analysis over the past several years, seeking answers to such questions as: What are the high-leverage gaps? What issues span multiple problem contexts? What kinds of collaborations among researchers in the statistical, applied mathematical and decision sciences and domain scientists are needed to carry out the research? All of these threads lead to the SAMSI risk analysis program. For the actual planning, organization and design of this program, the program leaders have held a conference call once per week with the SAMSI liaisons.

1.5 Major Participants

Various senior visitors will play a critical role during the program through their extended presence at SAMSI, including Dipak Dey (University of Connecticut), Pilar Munoz (Universidad Barcelona), David Perry (University at Haifa), David Rios (Universidad Rey Juan Carlos), Sid Resnick (Cornell) and Larry Brown (University of Pennsylvania).

203 New Researchers (and junior faculty): Nicoleta Serban, Ren Cuirong (University of South Dakota), (two additional new researchers in negotiations with their institutions)

Postdoctoral Fellows: Jayanta Pal (University of Michigan), (two additional candidates with offers pending plus one or two non-SAMSI postdoctoral fellows expected to accompany their mentors to SAMSI) : Graduate Students: There will be various graduate students involved in the program from the triangle universities. In addition, we anticipate one or more graduate students fellows from outside the triangle area.

Locat Participants: Faculty from the three affiliated universities in the Research Triangle who will be active in the program and will have faculty release time are Richard Smith (Statistics and Operations Research, UNC) and Wenbin Lu (Statistics, NCSU); in addition faculty from Duke who are expected to be involved without faculty release time are David Banks (ISDS) and one or more faculty from Decision Sciences in the Fuqua School..

1.6 Description of Activities

Workshops: The Opening Workshop will be September 16, 2007 - September 19, 2007. Its principal goal will be to engage a broadly representative segment of the statistical, applied mathematical and decision analysis/operations research communities in formulation and pursuit of specific research activities to be undertaken by the Program Working Groups, discussed above. There will also be mid-program workshops on focused topics. The first of these will take place in October: Risk: Perception, Policy and Practice. A workshop on Extreme Events: Theory, Prediction and Cost will be held in late January. Other workshops will be organized by the working groups; and a Transition Workshop, at the end of the program, to disseminate program results and chart a path for future research in the area.

Courses: Team-taught courses will be held at the NISS/SAMSI building during the fall semesters. The fall semester course will begin with an introduction to decision theory as a foundation for risk assessment and management; it will continue with a systematic approach to risk analysis, and then conclude with an introduction to expert opinion elicitation and modeling. The spring semester course will focus on cost prediction approached from several points of view – operations research, decision science, actuarial science. If there is sufficient interest, a second spring semester course will focus on extreme value theory and α-stable processes.

Working Groups: The working groups will meet regularly thoughout the program to pursue particular research topics identified during the Opening Workshop or during the January workshop. Each working group consists of SAMSI visitors, postdoctoral fellows, graduate students and local faculty and scientists. In addition, remote participation will be possible for researchers who want to continue to collaborate following the Opening Workshop, especially for those researchers who are able to be in residence at SAMSI for shorter periods of time during the program.

204 2. Random Media

2.1 Summary

The program will address a number of fundamental issues pertaining to random media including scattering theory in highly discontinuous and random media, time reversal, model development, analysis, and numerical approximation for interface methods, and imaging in random media. The inherent synergy between deterministic, statistical, and physical analysis necessitates a concerted collaboration between applied mathematicians, statisticians, engineers, geologists, and material scientists which has often been absent in the past and is necessary to provide fundamental advances to the field.

2.2 Research Foci

The field of random media is a classical one which is presently receiving widespread attention as new theory, approximation techniques, and computational capabilities are applied to emerging applications. Due to the breadth of the field, the inherent deterministic, stochastic and applied components have typically been investigated in isolation. However, it is increasingly recognized that these components are inexorably coupled and that synergistic investigations are necessary to provide significant fundamental and technological advances in the field. The SAMSI Program on Random Media will provide a forum to investigate statistical and deterministic components of random media for applications including, but not limited to, time reversal, interface problems, imaging in random media, and scattering theory for discontinuous media.

Time Reversal: The component on time reversal will build upon recent analysis and experimental observations that time reversal of waves propagating in disordered media permit refocusing. This somewhat unexpected property has profound ramifications in domains such as wireless communications, medical imaging, nondestructive evaluation, and underwater acoustics. Whereas the behavior of one-dimensional acoustic waves is mathematically and statistically understood, questions regarding multidimensional media remain widely open with the exception of the bi-axial wave equation.

Interface Problems: Interface problems arise in a diverse range of applications including mul- tiphase flows and phase transitions in fluid mechanics, thin film and crystal growth simulations in material science, and mathematical biology problems modeled by partial differential equations involving moving fronts. In computational fluid dynamics, electromagnetic scattering and ground water flows, efficient numerical approximation are essential for quantifying the effective property of the medium due to fluctuating in homogeneous and random medium. The level set method has proven to be an extremely versatile tool for tracking deformations in shape geometries, moving interfaces, and free boundaries in a number of related applications, and one facet of the program will focus on extensions of this approach to include the effects of random media and stochastic processes. Other aspects of the interface component will focus on modeling and analysis of random interface growth processes including crystal growth and solidification, Monte-Carlo Wiener-Chaos expansion and homogenization methods for stochastic partial

205 differential equations, and level set methods and Lagrangian formulations (particle approaches) for random media simulations.

Imaging: Imaging problems in random media arise in a number of applications including biomedical imaging and seismic analysis. In the latter category, a detailed knowledge of earth medium heterogeneities is necessary for oil and gas recovery, earthquake and volcanic predictions, and environmental analysis. One fundamental issue involves the multiscale relation between large scale structures, which are considered as deterministic, and small scale heterogeneities which are considered to be random fluctuations form the deterministic structures. A related issue concerns the analysis of coupled processes.

Scattering Theory: Whereas mathematical scattering theory for one-dimensional regimes is fairly mature, little of the analysis extends to multidimensional media with the exception of the baraxial wave equation. Hence this facet will focus primarily on the development of theory, numerical methods and validation techniques pertaining to scattering theory for multidimensional media.

Porous Media: It is expected that at least one working group will focus on topics pertaining to stochastic transport processes and physics associated with porous media. Aspects of this research will interface with the other focus topics.

2.3 Program Timing and Previous Related Workshops

Within the last ten years, the field of random media has experienced a number of advances which delineate both the potential for the field and fundamental and technological limitations which must be addressed. Whereas, there have been workshops and conference on related subtopics within the last five years, there have been no concerted forums devoted to the combined statistical, mathematical and physical analysis of the full topic. Hence, the SAMSI program for 2007-08 is very timely.

Recent related workshops and conferences include the following.

• Joint Summer Research Conference in the Mathematical Science: Waves in Periodic and Random Media (Mount Holyoke College, June 2002) – This conference focused primarily on analytic, numerical and physical perspectives of topics including photonic crystals, wave propagation in linear and nonlinear periodic media, waves in mesoscopic media, and surface waves.

• Workshop on Probing Earth Media having Small-Scale Heterogeneities (Tohoku University, Japan, November2004) –This conference focused on advances in the field of seismic wave propagation in earth media containing small-scale heterogeneities.

• IMA Annual Program Year Workshop: Imaging from Wave Propagation (Institute for Mathematics and Its Applications, October 2005) – This program combines aspects of seismic imaging and wave propagation.

206 2.4 Organization and Program Leadership

The program leaders are: Russel Caflisch (UCLA), Maarten DeHoop (Purdue University – Chair), Rick Durrett (Cornell University –NAC Liaison), Weinan E (Princeton University), Josselin Garnier (Universite Paris VII), William Kath (Northwestern University), George Papanicolaou (Stanford University), Lenya Ryzhik (University of Chicago), Ralph Smith (SAMSI, Directorate Liaison), Chrysoula Tsogka (University of Chicago), Eric Vanden-Eijnden (NYU), Jack Xin (UC Irvine), Wojbor Woyczynski (Case Western Reserve University), and Hongkai Zhao (UC Irvine).

2.5 Major Participants

Long-Term Visitors: The following individuals are presently scheduled to spend between a month and semester at SAMSI participating in the program: Isaac Klapper (Montana State University), Yimin Xiao (Michigan State University), Laurent Demanet (Stanford University), Taufiquar Khan (Clemson University), Hongkai Zhao (UC Irvine). Additionally, John Strain (Berkeley), Josselin Garnier (Paris VII), Yu Chen (Courant Institute), Xiaofan Li (Illinois Institute of Technology), Maarten deHoop (Purdue University), and Qiang Du (Penn State) have expressed interest in long term visits that are under present negotiation.

Postdoctoral Fellows: Elaine Spiller (Mathematics, SUNY-Buffalo), Weigang Zhong (Mathematics, Maryland)

Graduate Students: We anticipate the participation of a number of graduate students both from local and non local universities.

Faculty Releases: Tom Beale (Mathematics, NCSU), Greg Forest (Mathematics, UNC), Kazi Ito (Mathematics, NCSU), Chuanshu Ji (Statistics, UNC), Zhilin Li (Mathematics, NCSU) Mauro Maggioni (Mathematics, Duke)

2.6 Description of Activities

2.6.1 Workshops

Opening Workshop: The Kickoff Workshop and Tutorial will be September 23–26, 2007. The principal goal of the workshop will be to engage a broadly representative segment of the applied mathematical, statistical, physical and engineering communities to determine research directions to be pursued by working groups during the program.

Midprogram Workshops: Several midprogram workshops are anticipated. The number and scope will likely grow as the program evolves.

• Workshop 1: Interface Problems(10/29/07–11/2/07): This will focus on applications, models, theory, and numerical applications involving interface and free boundary problems. In

207 addition to presentations, this will likely involve tutorials on the level set method, the phase field method, and the augmented approach and immersed interface method. • Workshop 2: Interface and Probabilistic Methods in Signal Processing and Inverse Problems (12/3/07 – 12/7/07): This will include theory and applications including image and speech signal processing including recognition and separation.

• Workshop 3: Reaction-Diffusion-Advection Fronts in Random Media (2/25/08 – 2/29/08): This workshop will include a tutorial by Jack Xin that focuses on modeling, analysis and probability theory associated with reaction-diffusion-advection fronts in heterogeneous media.

2.6.2 Working Groups

Each of the previously noted areas poses a potential working group topic. Additional working groups will be identified both as the program is developed and as future directions are identified in the Opening Workshop. Potential working groups can be summarized as follows:

• Time Reversal: This group will focus on mathematical and statistical aspects of time reversal for random media.

• Interface Problems: The working group on interface problems will focus on model development, deterministic and statistical analysis, and numerical approximation for a range of applications. One goal of this group is to identify and quantify the effects of random media and stochastic processes on level set methods.

• Imaging Problems in Random Media: This group will investigate statistical aspects such as the stability of imaging operators in background media with random fluctuations. The second focus will be on the characterization of interferometric methods such as virtual source imaging using limited aperture arrays.

• Scattering Theory: This group will focus on the analysis and characterization of waves in multidimensional media.

• Porous Media: This working group will focus on topics pertaining to stochastic transport processes and physics associated with porous media.

2.6.3 University Courses

The course “Numerical Methods for Free Boundary and Moving Interface Problems” will be taught at the NISS/SAMSI building during the fall semester by K. Ito, Z. Li and H. Zhao.

208 3. Program on Wireless Environmental Sensor Networks

3.1 Summary

Data gathered by wireless sensor networks, either fixed or mobile, pose unique challenges for environmental modeling: a complex system is being observed by a dynamical network. On the one hand an ideal information-sampling process across a sensor network would be governed by the complex system model that must simultaneously be informed by these data; on the other hand, the network itself is a dynamic system of self-organizing nodes which exhibit both independent and dependent behaviors. This presents a unique opportunity to organize the sensor system so that a local or micro event can trigger a broad or macro observation – or conversely, a macro observation can trigger highly detailed local data gathering. A collaborative effort of statistician, mathematicians, computational scientists and environmental scientists is required to formulate and to address the canonical modeling questions of what data to gather, how to fit the model, how to assess the uncertainty of inferences and how to reconfigure or relocate the network nodes. Most importantly this interdisciplinary effort will consider the problems and the consequences of utilizing or directly influencing the network dynamics to optimize data gathering, model assessment and prediction. The principal application will be forest ecosystems, with the objective of understanding how climate change and CO2 impact ecosystems from the cellular level within a leaf to the macroscale of the forest stand.

3.2 Background

Environmental sensor networks have the capability of capturing local and broadly information simultaneously; they also have the capacity to respond to sudden change in once location by triggering simultaneous observation across the network and/or by changing the observation frequency or otherwise altering the sampling plan. When different kinds or sensors are mounted together on each platform, this dynamic response capability can operate in a multimodal fashion. The open challenge is to design the observation process (algorithms), to model the environmental system in evolution using the multimodal data (“data fusion”) and to predict environmental response. Simulation is one chief tool for examining the complex environmental system – dynamic sensor network interaction; exploration of real data from one or more environmental systems is another. This program will bring together an interdisciplinary group of ecologists, mathematicians, statisticians, and computer scientists with the objective of formulating a methodology for wireless sensor networks used for environmental modeling: 1) Network Î Environmental Complex System Model: to define the modeling and technical challenges related to processes that will be informed by wireless networks, including the nature of spatio-temporal designs; 2) Environmental Complex System Model Î Network: to formulate probability-based adaptive sampling processes and to develop the algorithms needed for implementation in the context of environmental research; 3) Environmental Complex System Model Î Sensor motion control: to reposition mobile sensors to optimally co-locate active sampling;

209 4) Data Fusion; 5) Network Data Î Model-based Prediction: to develop scalable process models that can be implemented in an inferential framework and used for prediction.

Part of this effort will be general, and we plan to assemble a broad group of scientists who work in different systems to help define the intensity of sampling in space and time that is needed for understanding and prediction. Part will be focused on individual case studies, including a study of biosphere-atmosphere interactions in forest canopies, entailing a range of issues as broad as exchange of CO2 and H2O, C storage, microclimate, and biodiversity, through competition for light and soil moisture. The motivation for this research and its mathematical, statistical and computational complexity are clearly illustrated in the following example. Understanding how changing climate and CO2 will impact ecosystems, including storage of carbon and biodiversity, depends on the capacity to link processes that affect carbon uptake by individual leaves to the growth of whole trees in competitive forest stands. This ‘scaling problem’ has emerged as one of the most important, challenging, and sustaining issues for forecasting consequences of global change. The linkage from leaf to forest is necessary, because we can only measure the direct effects of temperature, CO2, and moisture at the scale of leaves, but the consequences of those effects are most relevant and profound at the level of a forest stand and a landscape. The process models of ecosystem change that will ingest real time environmental data build in assumptions about how data are collected, the traditional realm of statistics. Even standard data types present serious problems for modelers, often met with ad hoc 'inversion' techniques that are not based on probabilistic models and, thus, do not yield probability statements, in the form of confidence and prediction intervals. Bayesian methods are beginning to meet the demands of high-dimensional processes informed by heterogeneous data, building on graphical approaches. They open the potential to assimilate information obtained from entirely novel scales. Complementary to advances in modeling are wireless sensor networks, which promise to bridge key gaps in our capacity to observe scales of variation that affect forests (http://www.neoninc.org/documents/sensor_meet1_report.pdf). Given infinite power supply, there is now a capacity to collect infinitely dense data in space and time. This capacity brings with it new challenges of developing adaptive sampling algorithms that will be controlled by software and based on the ecosystem models that must ingest these data. Despite a proliferation of creative techniques for modeling spatiotemporal data, 'smart' wireless networks extend many of the challenges that are already well-recognized in the statistical community (e.g., non- stationary covariances, non-separable spatio-temporal covariances, parallel models needing state variables at different scales,…) and present new ones. Wireless networks lack infrastructure, consisting of nodes that self-organize for purposes of data collection and transmission. We can begin to think about scalable process models directly informed by highly unbalanced data, as opposed to, say, nested (but rigid) lattices. Sampling efficiency is paramount, because power is supplied by batteries, and data transmission is energy demanding. The need to minimize the transfer of redundant information (e.g., oversampling in space and time) must be weighed against the risk of missing rapid changes and potentially short-term pulses that are missed by coarse sampling designs. Clearly, sampling must be adaptive and, to the extent possible, regulated within the network. Efficient exploitation of this technology will depend on the capacity to

210 obtain observations that have ‘value’ for (provide learning) in ecosystem models that simultaneously ingest alternative sources of information. Combining unusually extensive field data, including wireless networks, within a powerful modeling framework, makes it possible to address the properties, or ‘scaling relationships’, that describe how the rapidly changing atmosphere translates from leaf level processes to whole-tree and forest response. A start in this direction would be to develop a graphical modeling structure that allows for decomposition into tractable subunits the complex set of processes that link atmospheric change, leaf physiology, carbon allocation and growth, and population demography. We anticipate novel implementation of a hierarchical framework to assimilate data from many sources and represent processes at different scales, from ½ hourly gas exchange by individual leaves to annual changes in tree size and population density. The graphical framework allows knowledge and uncertainty to propagate across scales and represents a full synthesis of what existing data can tell us. The scaling relationships needed to understand and forecast consequences of global change enter as hypotheses that can be weighed by the data themselves. A second cogent illustration is the observation of dynamic oceanographic processes. As our understanding of the oceans has advanced, it has become apparent that many critical processes occur at temporal and spatial scales that cannot be effectively sampled with traditional ship-, moored buoy-, or satellite-based approaches. The present generation of oceanographic field programs is fundamentally limited by too few measurements, taken too slowly, and at too great a cost. Traditional studies are particularly limited in their ability to investigate the onset and immediate aftermath of episodic events such as earthquakes, tsunamis, submarine volcanic eruptions, or hurricanes. While satellite observations have provided oceanographers with a unique global perspective of the ocean, they remain primarily limited to measuring properties at the air-sea interface. This urgent need to advance the understanding of the ocean has resulted in a new paradigm for doing ocean research in a more adaptive way. The NSF Ocean Observatories Initiative Science Plan (see http://www.orionprogram.org) calls for the exploitation of small, low-cost autonomous underwater vehicles that function as mobile ocean observing systems and provide broad spatial coverage. Reacting to short-lived events and changes observed in real time will require sophisticated mobile sensing platforms integrated into the ocean observatory infrastructure. Self-propelled mini-buoys, aerial and underwater vehicles can be remotely commanded to investigate episodic events, thus enhancing opportunities to capture key processes that do not occur in the vicinity of an observatory node. Mobile platforms can feed real-time data into mathematical models, provide in-situ predictions and adapt their behavior accordingly. Although technology provides the physical components of such sensor networks, their potential benefits are not yet being realized. As of today, one fundamental limitation is a lack of understanding on how to jointly address the distributed data fusion and motion coordination problems. Both problems are intimately intertwined. To coordinate their motion, agents need to collaboratively construct in-situ representations of the observed phenomena based on the collected data. Moreover, satisfactory solutions to the distributed data fusion problem rely upon motion coordination strategies that optimize information gathering of the physical processes. Most currently-available algorithms for mobile networks send all data back to a central station, where aggregation is performed. While such strategies are sometimes appropriate, they do not enable the network to rapidly adapt to the evolution of physical phenomena. Instead, data aggregation should be performed, at least partially, in-network to allow for decision making on the fly. Rapid adaptation must also rely on intelligent reduction of the computational

211 complexity. Utilizing computational geometry and topology to characterize network configurations, even if continually reorganizing, may allow reduction of computational complexity by minimizing the amount of sensing in addition to minimizing the computation per se. For example, computational complexity, time and cost are reduced when precise localization of an individual mobile sensor can be bypassed in favor of its position within the organization of the network.

3.3 Scientific Description

The five problems identified above interlace environmental science and ecology with computational mathematics and computer science, applied mathematics, statistics, and engineering. Therefore a successful approach to any one of these will require an interdisciplinary working group of ecologists, engineers, mathematicians, statisticians, and computer scientists. Therefore the first goal of this program is, from an interdisciplinary point of view, to define the modeling and technical challenges related to processes that will be informed by wireless networks and to identify among the combined disciplines the existing expertise and to determine the integrated resources to be brought to bear. One class of challenges implicit in pervasive sensor networks is based on the local-to- global transition; and as sensors become more miniaturized they return data which is increasingly local. Computational geometry and topology provide a number of novel techniques for converting local combinatorial data into global descriptors. A second class of challenges is in addressing the computational requirements efficiently when the model systems are complex, the networks themselves (either fixed and pervasive or mobile) are complex and dynamic or, under adaptive sampling schemes, quasi-dynamic, and data aggregation/analysis cannot be efficient if it resides solely in a centralized station. The initial step in finding possible approaches to the apparent [computational] complexity problems is distinguishing the elements of these that appear as data fusion problems, model complexity problems, “sensing complexity” problems, adaptive sampling problems and computational complexity. Then suitable integration of formulations and approaches by the varied disciplines can follow. Once research approaches have been defined, more specific goals in the context of forest ecology will be to develop algorithms needed for adaptive sampling of environmental variables and to develop scalable process models that can be implemented in an inferential framework and used for prediction. This example will provide tangible results and direct implementation to the study of biosphere-atmosphere interactions in forest canopies, entailing a range of issues as broad as exchange of CO2 and H2O, C storage, microclimate, and biodiversity, through competition for light, nutrients, and soil moisture.

3.4 Research Foci

Research foci will be built from themes involving

1) sampling: adaptive sampling and network-triggered observation,

212 2) computational issues: complexity, data fusion, mapping of algorithms to sensors, and

3) modeling: prediction and uncertainty, especially multiscale modeling - all with primary application to modeling forest response to global environmental change. Although the specific agendas for the research foci will be determined by the participants at the Planning Workshop and Opening Workshop, potential research topics might be drawn from the following questions.

3.4.1 Sampling from wireless networks:

Cost of spatio-temporal data in terms of both energy and delay: Each sample has a footprint in power in space and time, some value to one or more process models (e.g., importance of parameters in space and time, sensitivity of estimates to the observation), and some cost (e.g., data transmission). Is it possible to derive frameworks such that the utility of each sample exceeds its cost?

Frameworks for adaptive sampling: game-theoretic, reinforced learning, dynamic experimental design, topology-based optimization. How can a sampling scheme to respond to highly non- stationary dynamics in energy-constrained sampling networks. Are modes of operation controlled by adaptive state machines enough?

3.4.2 Computational issues for wireless network data:

Complexity of process models: What is the trade-off between accuracy (in terms of predictive power) and cost for various implementations, ranging from completely in-network computation to the case where the network just reports data based on commands from grid-based models?

Mapping of algorithms to wireless sensor networks: What is efficient integration of local and centralized computation? How should multi-scale computations be assigned to nodes, subnets and a centralized station? To what extent should incoming data determine computation allocation?

Data fusion for networks: How can model-updating or Bayesian approaches to estimation and signal processing decrease the rigidity distributed data fusion for mobile networks? What are the current computational limits for detection of small-scale features and/or speed of feature evolution?

3.4.3 Environment modeling from sensor networks:

Model complexity and adequacy: In the trade-off of dimensionality and predictive accuracy, what are the diagnostics for excessive vs. insufficient parametrization? Can models be developed so that reduced forms (e.g., deleting submodels, subsets of parameters, or reducing resolution of observations and/or parameter specification) still function simultaneously with near-optimality at several scales?

213 Prediction Uncertainty: Appropriate modeling of sources of uncertainty due to sampling (e.g., “lost” samples, outliers, bad sensors, measurement noise, and unreliable communication), and integration of data models with process models.

Model adequacy: Is there coherent noise in bio-micrometeorological systems that should drive exploration of new regions of the state space?

Broader goals of the proposed project include the capacity to integrate data and process across the range of scales that define how forested ecosystems respond to global change. This process level understanding is critical for anticipating consequences of human impacts on landscapes. The approach entails integrated models of a complex system and involves heterogeneous data, process understanding, uncertainty, and model selection. It is interdisciplinary, involving global change, ecophysiology, forest dynamics, statistics, and computer sciences. Broader applications will include increased capacity to forecast consequences of global change and the integration of mathematics, statistics, and environmental sciences. The products of our analysis include mathematical and statistical models that can be directly assimilated with global models of climate and CO2 change. Inference on scaling relationships will be implemented in simulation of whole forests to examine potential consequences of changing climate and atmospheric CO2 for forest diversity and carbon sequestration. These results will have immediate application to the problem of forecasting biosphere responses to atmospheric change.

3.5 Program Leadership and Participants

Program Leaders Committee: Zoe Cardon, UCONN; Jorge Cortes, UCSC; Don Estep, CSU; Deborah Estrin, UCLA; Paul Flikkema, NAU; Mark Hansen, UCLA; National Advisory Committee Liaison – Bin Yu, Berkeley; SAMSI Directorate Liaison – Jim Berger; Local Scientific Coordinators – Jim Clark, Duke, and Alan Gelfand, Duke Confirmed semester or quarter visitors: Jorge Cortes (UC Santa Cruz), Paul Flikkemma (Northern Arizona), Rosalba Ignaccolo (Torino), Cuirong Ren (South Dakota State); discussions are ongoing with several others.

Others contacted and potentially interested in shorter term visits: Anstassia Ailamaki, Carnegie Mellon; Peter Arzberger, UCSD; Dennis Baldocchi, UC Berkeley; Bruce Beck, Georgia; Kate Calder, Ohio State; Benoit Courband, CEMAGREF; Todd Dawson, UC Berkeley; Julia Downes, North Florida; Martin Haenggi, Notre Dame; Richard Han, Colorado; Leana Golubchik, USC; Robert Gray, Stanford; Hoshin Gupta, Arizona; Tom Harmon, UCLA; William Kaiser, UCLA; George Koch, NAU; Tim Kratz, Wisconsin; Martin Lechowicz, McGill; Mark Lewis, Alberta; Kirk Martinez, U Southampton; Margaret Martonosi, Princeton; Jerry Melillo, Woods Hole; Russell Monson, CSU; Gregory Pottie, UCLA; David Neuhoff, Michigan; Rob Nowak, Wisconsin; Doug Nychka, NCAR; Chris Paciorek, Harvard; John Porter, UVA; Phil Rundel, UCLA; Lindsey Seders, Notre Dame; Chris Shoemaker, Cornell; Kieth Smettem, Western Australia; Mani Srivastava, UCLA; Kannan Ramchandran, UC Berkeley; Lang Tong, Cornell; Chris Wikle, Missouri; Steve Wofsy, Harvard; Don Zak, Michigan.

214

Postdoctoral Fellows: Yongku Kim (Ph.D. Ohio State), XuanLong Nguyen (Ph.D. Berkeley), Michael Porter (postdoctoral associate – primary NCSU).

Local Faculty Fellows: Jim Clark (Duke), Alan Gelfand (Duke), Jun Yang (Duke), Kim Weems (NCSU)

Other local faculty interested in involvement: Pankaj Agarwal, Duke; Jerry Davis, NCSU; Gaby Katul, Duke; Ram Oren, Duke

3.6 Description of Activities

Although this will formally be a Spring, 2008, program, activities will be planned in advance of the program to ensure that the research gets off to a fast start.

3.6.1 University Courses

There will be a Fall, 2007 course taught at SAMSI that may be taken for graduate credit at any of the local universities, and will serve to prepare graduate students and postdoctoral Fellows for the program start in the Spring. There will also likely be a Spring course taught at SAMSI to provide more advanced training on topics relevant to the research of the Working Groups.

3.6.2 Planning Workshop

A planning workshop will be held from October 22-24, 2007 at SAMSI. The workshop will involve the program leaders and long-term visitors to the program, as well as select additional experts. The purpose will be to formulate likely working groups for the program, to enable a fast start to the research in the Spring. 3.6.3 Opening Workshop

The Opening Workshop for will engage a broad scientific and mathematical community in the exploration of the varied aspects of interdisciplinary work in this area. Tutorials and presentations will provide an initial orientation by researchers in each of these aspects to those from other disciplines.

3.6.4 Working Groups

The Program will revolve around four to six Working Groups, formalized at the end of the Opening Workshop, based on the affinities and expertise of the participants. Every Working Group will have multi-disciplinary membership. The Working Groups will include senior researchers – visiting or local scientists – plus junior visitors, several SAMSI Post-Docs, and graduate students (with senior visitors particularly encouraged to bring their own graduate students to participate when suitable). All students and Post-Docs will have personal mentors in residence at SAMSI, either drawn from the Program Visitors or from local university faculty released from duty to participate in the Program. Working Groups will meet at least weekly and will set their own research agendas, which will likely involve smaller focused workshops.

215

3.6.5 Transition Workshop

A Transition Workshop will follow the Program, most likely in Fall, 2008, to allow continuation of the research over the summer to bring specific projects to fruition. The workshop will be both a presentation of Working Group results and a plan for transitioning the research outside of SAMSI.

3.6.6 Relationship to Other Mathematics Institutes’ Programs

Among the past and planned DMS-Institute programs, this one is unique. It shares a topical focus on ecology and the environment with a 2005-6 MBI program on Ecology and Evolution; it appears to share some network conceptualization ideas with the IPAM 2003-4 program on Large Scale Communication. However, the resemblance in technical focus does not go deeper. This program is also unique in its assembly of a broad based research community including statisticians, applied and computational mathematicians together with basic scientists, as well as computer scientists.

3.6.6 Leveraging Past SAMSI Programs

Aspects of this program will build on previous SAMSI programs on Large-Scale Computer Models (environmental models) in 2002-3 and Multi-Scale Modeling in 2003-4. Technical work on data fusion and on the network effects on complex system models necessarily encounters random matrices, so this program will interleave naturally and beneficially with the SAMSI program in High-dimensional Inference and Random Matrices in 2006-7. It also will benefit heavily from work done in the 2006-07 program on Development, Assessment and Utilization of Complex Computer Models.

216 4. Challenges in Dynamic Treatment Regimes and Multistage Decision-Making

4.1 Scientific Overview

The management of chronic disorders, such as mental illness, substance dependence, cancer, and HIV infection, presents considerable challenges. In particular the heterogeneity in response, the potential for relapse, burdensome treatments, and problems with adherence demand that treatment of these disorders involve a series of clinical decisions made over time. Decisions need to be made about when to change treatment dose or type and regarding which treatment should be used next. Indeed, clinicians routinely and freely tailor treatment to the characteristics of the individual patient with a goal of maximizing favorable outcomes for that patient. To a large extent the tailoring of sequences of treatments is based on clinical judgment and instinct rather than a formal, evidence-based process. These realities have led to great interest in the development of so-called “dynamic treatment regimes” or “adaptive treatment strategies.” A dynamic treatment regime is an explicit, operationalized series of decision rules specifying how treatment level and type should vary over time. The rule at each stage uses time-varying measurements of response, adherence, and other patient characteristics up to that point to determine the next treatment level and type to be administered, thereby tailoring treatment decisions to the patient. The objective in developing such multistage decision making strategies is to improve patient outcomes over time. Methodology for designing dynamic treatment regimes is an emerging area that presents challenges in two areas. First, experimental designs for collecting suitable data that can be used efficiently to develop dynamic regimes are required. Second, techniques for using these and other data to deduce the decision-making rules involved in a dynamic regime must be developed. In both areas, input from researchers in a variety of disciplines and collaborations among them will be critical. Currently, decision rules that make up a dynamic regime are formulated based on clinical experience, disease-based theories, and expert opinion and then evaluated in randomized two- group trials against a usual or standard treatment. Such an experimental approach assesses a dynamic regime as a “package” and hence does not provide evidence on usefulness of com- ponents of the strategy (e.g., timing of treatment alterations and the choices of treatment to which to switch). Accordingly, trials in which patients are randomized to different treatment options at each decision point have been proposed; however, little is known about when such trials should be conducted in lieu of the current approach of melding clinical judgment and expert opinion to formulate decision rules and using the standard two-group paradigm. An alternative approach is to conduct a series of randomized trials, as in agriculture and engineering; again, there is little guidance on how to implement this approach when the goal is to develop a dynamic regime. Methods to make use of data in developing dynamic regimes involve complex considerations. The construction of optimized decision rules requires incorporating the effects of future decisions when evaluating present decisions, as is well-known to scientists working on improving multistage decision-making. Treatment given at any time may set a patient up for improved response to subsequent treatments or have delayed effects that either enhance or reduce effectiveness of subsequent treatments. The development of a dynamic regime hinges on how one operationalizes the relative importance of patient outcomes over time. Researchers who

217 work on multistage decision problems in other contexts (, , control theory) readily recognize these types of issues. A key challenge is to determine how to collect sufficient information to ascertain the “state” of an individual insofar as making treatment decisions goes. Computer scientists working the field of reinforcement learning and statisticians working in medical decision-making have quantified the properties that the state must possess; in addition, practical considerations associated with feasibility of collection, cost, and patient burden must be taken into account. Typically, a great deal of information is available at each decision point, and methods for best summarizing this information yet maintaining the summary’s usefulness for deciding on treatment alteration or type are required. Methods for feature extraction developed by statisticians and computer scientists are well-suited to this problem, but the focus on multistage decision-making rather than prediction requires evaluation of these methods from a different perspective. Computational and inferential challenges arise in all of these endeavors; e.g., complexities of optimizing dynamic regimes can invalidate standard statistical inferential techniques, scientific considerations entail thinking beyond the standard loss functions familiar to statisticians, and the abundance of information at each decision point quickly leads to a “small n, large p” problem and the attendant computational issues. For some disorders, e.g., HIV infection, knowledge of the underlying within-subject biological has led to development of sophisticated mechanistic models for the processes governing disease progression and effect of treatment, which offer a scientific basis (via closed loop control methods) for designing dynamic regimes; however, this approach has not been widely explored or tested in samples of patients in this context. In summary, the critical need for development of methodology for dynamic treatment regimes, the considerable theoretical and practical challenges, and the relevance of work across a range of disciplines requires a forum in which the essential exchange of ideas and collaboration necessary to propel advances is fostered. Moreover, many in the statistical and applied mathematical research communities are unfamiliar with this area and the research opportunities it presents. The proposed SAMSI summer program will bring this area to the attention of statistical and applied mathematical scientists, whose expertise is critical; jump-start the necessary methodological development; and nurture the necessary interdisciplinary collaboration and communication between statisticians/applied mathematicians and computer scientists and health and behavioral science researchers. We especially hope to intrigue and engage more junior researchers with relevant expertise.

4.2 Program Scope, Timing, And Activities

The program will take place in the summer of 2007 over a two-week period (two successive Monday–Friday periods straddling one weekend). The following activities are planned:

Tutorials (Monday-Wednesday, Week 1). Six tutorials will be held, two per day, to provide participants unfamiliar with the foundation necessary to assimilate more advanced development. All tutorials will be taught by experts in these areas. Suggested names of instructors are given; instructors will be recruited and finalized following approval of the program:

• Day 1: Introduction to Causal Inference (Miguel Hernan) and Introduction to Dynamic Treatment Regimes (Butch Tsiatis)

218 • Day 2: RL with Additional Discussion of Connections to Classification (Ron Parr) and Computational Challenges with High Dimensional Data (Joelle Pineau)

• Day 3: Introduction to Mechanistic Models and Control Theory (Daniel Rivera) and Introduction to Nonstandard Statistical Inference (Susan Murphy)

Workshop (Thursday-Friday, Week 1). The workshop will feature one or two overview talks presenting the “big picture” of the methodological challenges followed by more advanced and targeted talks on research relevant to development of dynamic treatment regimes that build on the foundation provided by the tutorials. There will be talks the first full day and second morning, after which participants will be divided into discussion groups centered around four key areas that will form the basis for “brainstorming” by working groups during the next week of the program (below). The discussion groups will develop lists of important challenges and questions centered around their theme. We believe that the tutorials and workshop could attract a significant audience. A con- ference on causal inference in January 2006 at Johns Hopkins, which features sessions on dynamic treatment regimes, was attended by 200 people (registration was closed when the audience became this large).

Working Groups (Monday-Wednesday, Week 2). Four Working Groups will convene to discuss and prioritize challenges in their respective areas. Participants will identify the most pressing problems and outline modes of attack and specific research directions to be pursued. The proposed working group foci and potential lead participants are:

• Difficulties In Statistical Inference (Peter Bartlett, Susan Murphy, Sasha Rakhlin, Jamie Robins).

• Bayesian Approaches (Giovanni Parmigiani, Dan Scharfstein, Peter Thall).

• The Role of Mechanistic Models (Tom Banks, Victoria Chen, Marie Davidian, Daniel Riviera).

• Practical Challenges and Applications (Erica Moodie, Joelle Pineau, Butch Tsiatis).

Working Groups will meet daily according to a schedule that will allow participants to be involved with more than one group if desired. Dr. Murphy has led two successful, small (∼ 20 individuals) meetings of a “network” of diverse researchers from statistics, computer science, engineering, behavioral science, and medicine devoted to this topic, and the willingness of these individuals to participate in several such meetings suggests to us that the Working Groups will be enthusiastically attended by our envisioned 5–6 participants per group. We hope to attract at least 2–3 junior researchers to each group.

Summary and Transitional Workshop (Thursday-Friday, Week 2). Each Working Group will present their results, findings, and recommendations for the future to the all Working Group

219 members along with additional participants who may return for this final activity. Discussion will follow each group’s presentation. These presentations and discussions will form the basis for a white paper outlining methodological challenges in the area of dynamic treatment regimes to be written by the Program Leaders for submission to a leading statistical or mathematical science journal, with input from participants. Because this area is so new and does not have a long-standing and established groundwork of methodological research, we believe that a two week period is most appropriate and sufficient to achieve the objective of raising awareness and spearheading the necessary activities and connections.

4.3 Program Leadership

The Program Leaders Committee is currently Susan Murphy (University of Michigan), Daniel Scharfstein (Johns Hopkins Bloomberg School of Public Health), Joelle Pineau (McGill Uni- versity); Local Scientific Coordinators – Marie Davidian and Butch Tsiatis (North Carolina State University); Directorate Liaison – Jim Berger (SAMSI).

4.4 Program Participants

We envision that the Tutorials and Workshop will attract a diverse group of about 50 participants from statistics, applied mathematics, computer science, and the health and behavioral sciences. We expect 20–25 participants to remain in residence during the second week to participate in Working Groups. Strong efforts will be made to recruit junior investigators for participation in the Working Groups.

4.5 Program Outcome

The goal of the Program is to stimulate and delineate the needed statistical/mathematical/ algorithmic research in this important area. The desired outcome for the program will be a well- defined, concrete list of specific research directions that should be pursued to advance the needed methodological development. These will be brought to the attention to the research community via the planned white paper summarizing these directions.

220 5. Summer Program: The Geometry and Statistics of Shape Spaces

Shapes are prevalent in the outside world and in science. They manifest themselves in live animals, plants, landscapes, or in man-made materials, like cars, planes, building, bridges, and 5.1 Scientific Overview they are designed from aesthetic as well as efficacy considerations. Internal organs of humans or other animals also have a commonly accepted, well-defined shape, and their study is an old science called anatomy. For the human mind, there is an intuitive notion of what shapes are, why they differ or look alike, or when they present abnormalities with respect to ordinary observations. Sculpture is the art of rendering existing shapes, or creating new ones, and the fact that artists are still able to provide unambiguous instances of subjects through distorted or schematic representations is a strong indication of the robustness of the human shape recognition engine. However, an analytical description of a shape is much less obvious, and humans are much less efficient for this task, as if the understanding and recognition of forms works without an accurate extraction of their constituting components, which is probably the case. We can recognize a squash from an eggplant or a pepper using a simple outline, and even provide a series of discriminative features we can distinguish, but it is much harder to instantiate a verbal description of any of them, accurate enough, say for a painter to reproduce it. It is therefore not surprising that, for mathematics, shape description remains mostly a challenge. The last fifty years of research in computer vision has shown a amazingly large variety of points of view and techniques designed for this purpose: 2D or 3D sets they delineate (via either volume or boundary), moment-based features, medial axes or surfaces, null sets of polynomials, configurations of points of interest (landmarks), to name but a few. Yet, it does not seem that any of these methods has emerged as ideal, neither conceptually nor computationally, for describing shapes. Beyond the shape characterization issue, the more ambitious program which has interested a large group of researchers during the last two decades, starting with the seminal work of David Kendall, is the study of shapes spaces and their statistics. Here shapes are not only considered individually, but they are seen as variables, belonging to some generally infinite dimensional space which possesses a specific geometry. The theoretical study of such spaces, the definition of computationally feasible algorithmic and statistical procedures has been the subject of a still growing line of work. For example, Kendall’s original contribution focused on collections of landmarks modulo the action of rotation and scale. It has since been extended to the actions of other groups and to plane curves instead of points. Other examples build shape spaces using the medial axis representation. The last few years has seen the emergence and the development of several new techniques, building infinite dimensional Riemannian metrics on curves and other shape representations, involving several groups over the world. Within applied mathematics, the analysis of shape spaces arises at a nodal point in which geometry, statistics and numerical analysis each have a fundamental contribution. The proposed SAMSI summer program will bring this area to the attention of statisti- cians, computer scientists and mathematical scientists, whose expertise is critical for the full development of a mathematical and statistical theory of shape which has direct application to a

221 diverse array of problems in computer imaging, medical imaging, and the physical and biological sciences. The program will include a mixture of tutorials, research presentations, and working group activities on the subject. The goal is to provide an entry point into the field to interested students and faculty, and to allow researchers who are specialists in the area to exchange recent results and information.

5.2 Program Scope, Timing, And Activities

The program will take place in the summer of 2007 over a seven day period (from Saturday through Friday). The meeting consists of two days of tutorials, three days of conferences on shape spaces and two days for working group meetings.

Tutorials at Radisson RTP (Saturday-Sunday July 7-8):

Two days of tutorials will be held to provide participants with the foundations for infinite dimensional shape spaces, statistics and analysis on these spaces, and a survey of the application of these methods to medical imaging. All tutorials will be taught by experts listed for each area.

July 7: Differential geometry and curvature in infinite dimensional spaces with application to shape spaces by Peter Michor (Universitat Wien) and David Mumford (Brown University) and Diffeomorphisms as an infinite dimensional Lie group and the Euler-Poincare reduction by Laurent Younes (Johns Hopkins University) and Darryl Holm (Imperial College, London) (to be confirmed.)

July 8: Probability measures and statistics on function spaces and nonlinear infinite dimensional spaces by Alain Trouve (Ecole Normale Superieure de Cachan), Numerical methods for shape analysis by Stephen Marland jointly prepared with Robert McLachlan (Massey University), and Shapes in medical imaging: Computational Anatomy by Michael Miller(Johns Hopkins University).

Workshop: Geometry and Statistics of Shape Spaces at Radisson RTP (Monday – Wednesday, July 9-11) . The workshop will feature talks by invited principal lecturers of approximately 45 minutes, shorter invited talks by New Researchers of approximately 20-25 minutes, and a poster session to which all participants are invited to contribute.

Confirmed Principal Lecturers are: Daniel Cremers (University of Bonn), James Damon (University of North Carolina at Chapel Hill), Peter Giblin (The University of Liverpool), John Kent (University of Leeds), Benjamin Kimia (Brown University), Hamid Krim (North Carolina State University), Huiling Le (University of Nottingham), Robert McCann (University of Toronto), Steve Pizer (University of North Carolina at Chapel Hill), Anuj Srivastava (Florida State University), Paul Thompson(University of California, Los Angeles), Keith Worsley (McGill University).

222 Confirmed New Researcher Lecturers are: Yan Cao (University of Texas, Dallas), Pedro Felzenszwalb (University of Chicago), Tom Fletcher (University of Utah), Kathryn Leonard (California Institute of Technology), Namrata Vaswani (Iowa State University)

There remain several invited speakers still to be confirmed. These lectures will present recent developments related to the geometric and statistical properties associated to shapes and the application of these results to a wide variety of imaging problems. The lectures will emphasize open problems and new directions arising from the results which will be presented.

Working Groups at SAMSI (Thursday-Friday July12-13): Four Working Groups will be formed to discuss open problems, and promising areas for investigation. The group meetings will each consist of open discussion sessions, one for each half day. This will allow participants to be involved with more than one group if desired. The proposed working groups will consider the following topics: Topic 1: The geometry of shape spaces. Topic 2: Probabilistic models of shapes. Topic 3: Applications of 2D shape analysis. Topic 4: Applications of 3D shape analysis

5.3 Organization and Program Leadership

Program Leaders: Darryl Holm (Imperial College, London), Peter Michor (University of Vienna), Michael Miller (Johns Hopkins University), David Mumford (Brown University), Tilak Ratnanather (Johns Hopkins University), Alain Trouve (Ecole Normale Superieure de Cachan) and Laurent Younes (Johns Hopkins University); Directorate Liaison – James Damon (SAMSI).

5.4 Program Participants

Invitations were extended to approximately 90 participants. So far an additional 15 interested people have learned of the workshop and have registered. We expect that approximately 60 -70 participants from statistics, mathematics, computer science, and the health sciences will participate in the Tutorials and Workshop. Of these we expect approximately 30 participants to participate in Working Groups. The organizers will strongly encourage junior investigators to participate in the working groups.

5.5 Program Outcome

The goal of the Program is to introduce researchers to these new ideas for studying shapes in imaging problems via statistical and geometric properties defined using shape spaces. The goal of the program will be the identification of new mathematical and statistical directions for the analysis of shape stimulated by ongoing problems for various imaging modalities.

223 6. Brainstorming Meetings

As new science research areas emerge, the incorporation of mathematical sciences as research foci often comes well after the other sciences have well-developed agendas that may have involved the application of extant mathematical tools and computational methods. It is not necessarily the case that the tools and methodologies applied are inherently suited to the particular application; moreover these may be utilized in the absence of mathematical formulation of the essential issues.

In consequence, SAMSI is inaugurating a series of small workshops, each with the purpose of examining a new area of interdisciplinary research to determine what, if any, critical role the statistical and mathematical sciences might play in the research and also to determine what kinds of new development in the mathematical sciences might be required.

The first challenge is to ascertain what kinds of mathematical problems need to be addressed in terms of fundamental research, in terms of adaptation to the interdisciplinary research, in terms of formulation of the scientific research itself.

A new kind of workshop for SAMSI is planned to examine these questions in small groups of statistical and applied mathematical researchers plus scientists working in Nanotechnology. The goal is to outline an agenda for interaction and for initial exploration of fundamental as well applied research. There are 29 National Nanotechnology Initiative Centers including one at Purdue, a NISS-SAMSI affiliate. Therefore, Purdue was selected to host a “Brainstorming Workshop on Statistical and Mathematical Sciences for Nanoscience and Nanotechnology.” The local Chair for the Workshop will be Mary Ellen Bock, Chair of Statistics at Purdue University and Chair of the National Advisory Council for SAMSI.

This Workshop will consist of one day of open presentations by nanoscientists and nanoengineers followed by one day of closed sessions. Invited participants at this “Math & Nano Workshop” workshop will consist of a team from each of 10-15 institutions represented: One statistician or mathematician plus one nanoscientist or nanoengineer.

Opening presentations of the science (by the scientists) will provide a basis for discussion during the closed working sessions to delineate possible roles for Statistics and Applied Mathematics. The output from the workshop will be an outline proposing mathematical work from fundamental to methodological to formulational. It is hoped that the team from each institution will continue to function in interdisciplinary fashion upon return home. If these efforts are successful, a follow-on workshop will be held with the next 12 months to reappraise the outlined research possibilities and to consider planning a fullblown SAMSI Program within the next few years. While there are few statisticians and applied mathematicians working on mathematical research to underpin nanoscience, the work of those few may prove to be groundbreaking. Nanoscale problems present unique challenges because at the nanoscale things operate as individual units not aggregates: for example, tiny forces that are either ignorable or averageable at the microscale are NOT ignorable at the nanoscale. Computational demands are unprecedented, so that pure computing power is not a solution; however, Bayesian computation

224 and other self-informing methods may reduce the reconstruction time for a single nano-image from over 24 hours by at least an order of magnitude. Understanding these fundamental differences from aggregate behaviors may both require and inspire new mathematics.

A candidate for a second workshop is Quantum Computation and Quantum Communication. While mathematicians as well as physicists and computer scientists have been involved for some time, statisticians have not; and the problems in Quantum Communication in particular cannot be solved without accounting for the random processes inherent in the photon transmission and reception. It is anticipated that this second Brainstorming Workshop will follow approximately one year after the first and will also be located off-site.

225 B. Scientific Themes for Later Years

1. Sequential Monte Carlo Methods for Statistical and Scientific Computing (Tentative for 2008-2009)

1.1 Background

Many physical phenomena and much data can be accurately modeled using probabilistic or statistical models. Typical instances of these phenomena/data are the movement of a particle in a random medium, the volatility of the stock market, phylogenetic trees, the world wide web, auctions at ebay or images. However, even if it is possible to obtain realistic physical models or satisfactory statistical models, only the simplest can be solved without the use of numerical methods. Examples of the need for such numerical methods include the popular Ising model in physics and statistical image processing, non-linear non-Gaussian time series models, semi- parametric Bayesian models, and so on. Fortunately, the advent of enormous, cheap computational power and the development of a plethora of complex inference mechanisms has allowed scientists to develop powerful numerical methods. Amongst numerical techniques simulation-based methods, especially Monte Carlo (MC) methods, now play a central role. This approach, developed initially in physics during the early days of electronic computing, has been adopted by researchers in numerous scientific fields including computer science, statistics, computational biology, chemistry and physics. Broadly speaking MC methods are computer-based statistical sampling approaches for optimization, integration, and simulation of complex systems. MC techniques are an essential tool to scientists and, with the increasing complexity of problems to be addressed, these techniques are going to st become even more important in the 21 century. In particular, two classes of MC methods have emerged as the tools of choice for analysis and exploration of complex stochastic models and systems that are otherwise effectively impossible to analyze: Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC). MCMC methods are iterative algorithms for sampling from high-dimensional probability distributions. This problem appears in so many scientific applications that the Metropolis algorithm (the most basic MCMC algorithm) was ranked first in the “Top Ten Algorithms of the th 20 Century” selected by computer scientists, applied mathematicians and physicists (http://amath.colorado.edu/resources/archive/topten.pdf). MCMC algorithms have provided enormous scope for realistic statistical modelling and have attracted much attention from statisticians. Indeed, both Bayesians and frequentists need to integrate over possibly high- dimensional probability distributions in situations such as dealing with missing data or marginalizing nuisance parameters in order to make inference about the parameters of interest or to make predictions. MCMC methods make this integration straightforward conceptually. The past few years have witnessed an explosive growth of interest in MCMC methodology from researchers in almost all areas of statistics and this has led to a real revolution, especially in Bayesian statistics; e.g. [4]. However, many scientific problems cannot be addressed using MCMC. Indeed, when massive datasets are available, MCMC algorithms are generally far too cumbersome. Also, when data have to be processed “on-the-fly” because of real-time constraints, MCMC methods are not

226 appropriate because of their batch nature. More recently a set of alternative MC techniques known as Sequential Monte Carlo (SMC) methods has emerged to solve such problems. They have become very popular for solving optimal filtering problems; i.e. estimating the sequence of posterior distributions for non-linear non-Gaussian state-space models. In this context they are known as particle filtering methods. This class of methods has already had a massive impact in time series analysis, signal processing, robotics, computer vision or target tracking where they have become routine [2].

1.2 Challenges

In statistics and related fields, SMC methods are still closely associated to optimal filtering. This association has somehow hindered their development and hidden the main principles behind this elegant and principled methodology. It has now been realized that the potential of these methods is much greater than was first imagined [1], [3]. SMC methods are by no means limited to time-varying parameter estimation. Broadly speaking, SMC methods rely on a divide-and-conquer strategy for solving computational problems. Specifically, when a problem (which may be static) has a complex structure, it is often useful to decompose the target structure into a sequence of simpler but dynamically evolving substructures. One can then use the information obtained by solving a sequence of easier and smaller problems to help solve the ultimate target problem. The key to the success of this method is often the ability to gradually update the system from the simplest structure to the target structure. SMC methods can be applied to sample approximately from any sequence of probability distributions of interest and also to estimate their unknown normalizing constants. It is crucial to realize that many important problems can be formulated as special instances of this general framework: sequential and batch Bayesian inference, computation of Bayes factors, computation of p-values, inference in contingency tables, rare event probabilities, optimization, counting the number of objects with a certain property for combinatorial structures, computation of eigenvalues and eigenmeasures of positive operators, PDE’s admitting a Feynman-Kac representation and so on. Despite its success, this young research area is still in its infancy and there are many challenges to address. Like MCMC, SMC are not black box methods and a brute force application of standard algorithms will inevitably fail for complex stochastic models. For example, standard SMC algorithms perform poorly when applied to the high-dimensional models such as space-time models typically used in environmental and ecological statistics or atmospheric sciences.

1.3 Program

The main objectives of this program are to bring researchers together in a collaborative interdisciplinary environment: (1) Advance methodological and theoretical areas for SMC methods, (2) Identify and begin to address important emerging applied problems (e.g. data assimilation) The program will create an exceptional opportunity for exchange of ideas between

227 communities; and help to shape the future of stochastic computation. The program is highly interdisciplinary and ambitious. We will engage with and involve the key players in SMC methods from the following communities (at the very least):

Statistics and Mathematics, Signal Processing, Communications and Coding theory, Tracking and data fusion, Stochastic Control, Computer Vision, Robotics, Physics, Biological Sciences, Econometrics and Quantitative Finance, Oceanography, Environmental/atmospheric Sciences

Moreover, we will specifically include participants from the mainstream optimal filtering community (Kalman filtering/ unscented Kalman filtering) as well as more mainstream statistical computation (MCMC methods, variational methods.) In this way we aim to achieve a real fusion of ideas both to and from the SMC community, leading to a much more generic range of stochastic computation methods for the future. The program will last for 6 months and will comprise 2 international workshops, plus a core of invited visiting academics staying for periods of several months each. We will specifically include provisions for younger researchers (post-docs, PhD students) to participate, as well as more established individuals. The longer term visitors will participate in several theme areas with regular meetings and common objectives, organized around various theoretical or applied domains -convergence theory, static or dynamic problems, distributed problems, emerging application areas,…

1.4 Participants and Personnel

Program Leaders Committee: Anthony E. Brockwell (Statistics, Carnegie Mellon University), Arnaud Doucet (Statistics & Computer Science, University of British Columbia), Simon J. Godsill (Engineering, Cambridge University), Raquel Prado (Applied Mathematics and Statistics, University of California Santa Cruz); National Advisory Council Liaison -David Mumford (Brown University); Local Scientific Coordinator -Mike West (Duke University); Directorate Liaison -Jim Berger (SAMSI)

Definite Long-term SAMSI Visitors: • Arnaud Doucet

• Simon Godsill

Potential Participants and Visitors: Christophe Andrieu (Bristol, UK), Bhavik R. Bakshi (Ohio, USA), Andrew Blake (Microsoft Cambridge, UK), Mark Briers (Cambridge, UK), Wolfram Burgard (Freiburg, Germany), Kate Calder (Ohio State University), Mark Coates (McGill, Canada), Dan Crisan (Imperial College, UK), Rong Chen (Illinois, USA), Yuguo Chen (Illinois, USA), Nicolas Chopin (Bristol, UK), Manuel Davy (Lille, France), Nando De Freitas (British Columbia, Canada), Maria De Iorio (Imperial College, UK), Pierre Del Moral (Nice, France), Frank Dellaert (GeorgiaTech, USA), Persi Diaconis (Stanford, USA), Petar Djuric (Stony Brook, USA), Michael Dowd (Dalhousie, Canada), Geir Evensen (Bergen, Norway), (Lancaster, UK), Dieter Fox (Washington, USA), Neil Gordon (DSTO, Australia), Peter Grassberger (J¨ulich, Germany), Fredrik Gustafsson (Linkoping, Sweden), Simon Haykin

228 (McMaster, Canada), Alfred Hero (Michigan, USA), Michael Isard (Microsoft, USA), Ajay Jasra (Imperial College, UK), Jun Liu (Harvard, USA), Tom Higuchi (ISM Tokyo, Japan), T. Kirubarajan (McMaster, Canada), Genshiro Kitagawa, (ISM Tokyo, Japan), Chris Kreucher (Michigan, USA), Hans Kunsch (Zurich, Switzerland), Francois LeGland (IRISA, France), Simon Maskell (QinetiQ, UK), Peter Mueller (Texas, USA), Anastasia Papavasiliou (Warwick, UK), Mike Pitt (Warwick, UK), Avi Pfeffer (Harvard, USA), Nick Polson (Chicago, USA), James Rawling (Wisconsin, USA), Tobias Ryden (Lund, Sweden), Christian Robert (Paris, France), Gareth Roberts (Lancaster, UK), Chris Rogers (Cambridge, UK), (Oxford, UK), Sumeetpal Singh (Cambridge, UK), Vladislav B. Tadic (Sheffield, UK), Sebastian Thrun (Stanford, USA), Peter Jan van Leeuwen (Utrecht, Netherlands), Namrata Vaswani (Iowa, USA), Ba-Ngu Vo (Melbourne, Australia), Darren Wilkinson (Newcastle, UK), Patrick Wolfe (Harvard, USA), Junni Zhang (Beijing, China).

Possible New Researchers: A number of the individuals listed above are possible new researchers.

Postdocs: This is an area in which numerous excellent postdoctoral candidates will be available.

Faculty Releases from Partner Universities: There are numerous individuals in the partner universities with considerable interests in sequential monte carlo methods.

Affiliates and National Lab involvement: • LANL has a large group interested in sequential Monte Carlo.

• Many university affiliates are heavy players in the area.

• NCAR has significant interests in the area.

References:

[1] Del Moral P. (2004) Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications. New York: Springer-Verlag.

[2] Doucet A., de Freitas J.F.G. & Gordon N.J. (eds.) (2001) Sequential Monte Carlo Methods in Practice. New York: Springer-Verlag.

[3] Liu, J.S. (2001) Monte Carlo Strategies in Scientific Computing. New York: Springer- Verlag.

[4] Robert C.P. & Casella G. (2004) Monte Carlo Statistical Methods. New York: Springer- Verlag, 2nd edition.

229

2. Algebraic Methods in Systems Biology and Statistics (Tentative for 2008-2009)

2.1 Summary

In recent years, methods from algebra, algebraic geometry, and discrete mathematics have found new and unexpected applications in systems biology as well as in statistics, leading to the emerging new fields of “algebraic biology” and “algebraic statistics.” Furthermore, there are emerging applications of algebraic statistics to problems in biology. The proposed year-long program will provide a focus for the further development and maturation of these two areas of research as well as their interconnections. The unifying theme for the proposed year-long program is provided by the common mathematical tool set as well as the increasingly close interaction between biology and statistics. The program will allow researchers working in algebra, algebraic geometry, and combinatorics to interact with statisticians and biologists and make fundamental advances in the development and application of algebraic methods to systems biology and statistics. The essential involvement of biologists and statisticians in the program will provide the applied focus and a sounding board for theoretical research.

2.2 Background

Systems Biology

The development of revolutionary new technologies for high-throughput data generation in molecular biology in the last decades has made it possible for the first time to obtain a system- level view of the molecular networks that govern cellular and organism malfunction. Whole genome sequencing is now common place, gene transcription can be observed at the system level and large-scaleprotein and metabolite measurements are maturing into a quantitative methodology. The field of systems biology has evolved to take advantage of this new type of data for the construction of large-scale mathematical models. System-level approaches to biochemical network analysis and modeling promise to have a major impact on biomedicine, in particular drug discovery. However, many mathematical and computational challenges remain to be addressed. For instance, the availability of sequenced genomes for multiple species challenges existing sequence alignment methods used in evolutionary biology and new methods are required; algorithm development is a very active area of research. Even the problem of determining the best alignment of two sequences still represents computational challenges. Sequence comparison is a key technique in evolutionary biology and biomedicine. For instance, gene sequence comparisons have been used to uncover the conservation of certain infection mechanisms in the soybean pathogen Phytophthora sojae and the sudden oak death pathogen Phytophthora ramorum. Another important and related problem in evolutionary biology is the construction of phylogenetic trees describing evolutionary relationships between multiple species. New algorithms are being developed and here too algebraic techniques show great promise. The next higher scale of organismal organization is that of cellular biochemical networks regulated by transcription and translation of coding regions in the genome. Gene regulatory and

230 metabolic networks of interest are of a size and complexity that overwhelms current modeling approaches. Network inference methods that do not require large amounts of prior biological information are essential in order to keep up with the capacity for data generation. A concerted effort is underway to develop new mathematical and statistical tools for the modeling and simulation of large-scale biochemical networks. See, e.g., the recent workshop “Dialogue on Reverse-Engineering Assessment and Methods (DREAM)” (http://www.nyas.org/ebriefreps/main.asp?intEBriefID=596). Algebraic techniques have been developed for this purpose and were presented at the DREAM workshop, described in more detail below. Thus, the main two biological foci of the program will be sequence alignment and phylogenetics, and cellular biochemical networks. However, the program will also focus on other biological problems where algebraic techniques have shown promise, for instance the study of secondary RNA structures and the assembly of viruses.

Statistics

It has long been recognized that the geometry of the parameter spaces of statistical models determines in fundamental ways the behavior of procedures for statistical inference. This connection has in particular been the object of study in the field of information geometry, where differential geometric techniques are applied to obtain an improved understanding of inference procedures in smooth models. Many statistical models, however, have parameter spaces that are not smooth but have singularities. Typical examples include hidden variables models such as the phylogenetic tree models and the hidden Markov models that are ubiquitous in the analysis of biological data. Algebraic geometry provides the necessary mathematical tools to study non- smooth models and is likely to be an influential ingredient in a general statistical theory for non- smooth models. In addition to helping with the understanding of established statistical procedures in non-standard settings, algebraic geometric studies of parameter spaces can be helpful for the formulation of new goodness-of-fit statistics. This is in particular is of interest in the analysis of data from the modern biological experiments described above because these data are often very high-dimensional such that traditional large sample techniques in fixed dimensions may be inappropriate. The development of statistical theory and methodology inspired by algebraic geometry is an exciting topic of research that brings together the previously rather disconnected communities of algebraic geometry and statistics.

Algebraic Methods

Algebraic biology is emerging as a new approach to modeling and analysis of biological systems using tools from algebra, algebraic geometry, and discrete mathematics. The term was chosen by the organizers of the First International Conference on Algebraic Biology in Tokyo, in 2005. The successor conference is taking place in the summer of 2007 at the Research Institute for Symbolic Computation (RISC) in Linz, Austria, long an international leader in work on symbolic computation. Application areas cover a wide range of molecular biology, from the analysis of DNA and protein sequence data to the study of secondary RNA structures, assembly of viruses, modeling of cellular biochemical networks, and algebraic model checking for metabolic networks, to name a few. As an example, the problem of inferring agene regulatory network from one or more time course data sets of DNA microarrays is one of the central problems of systems biology. A

231 popular modeling framework for such networks is Boolean networks. However, these have the drawback that reducing the expression level of genes to an ON/OFF state set (the field with two elements) often loses a substantial amount of information. Using more general finite fields, one can represent the inference problem as a problem in computational algebraic geometry, and use Grbner basis techniques to represent the entire model space consistent with the given data set and select models based on different criteria. Algebraic statistics is a new field, less than a decade old, whose precise scope is still emerging. The term itself was coined by Giovanni Pistone, Eva Riccomagno and Henry Wynn, with the title of their book Algebraic Statistics: Computational Commutative Algebra in Statistics. That book explains how polynomial algebra arises in problems from experimental design and discrete probability, and it demonstrates how computational algebra techniques can be applied to statistics. The first of these applications have focused on categorical data and include the study of Markov bases and conditional inference, disclosure limitation, and parametric inference, to name a few; the book Algebraic Statistics for Computational Biology, by Pachter and Sturmfels, features additional topics. More recent work has begun to study the geometry of models for continuous variables such as the factor analysis model. The central idea underlying algebraic statistics is that the parameter spaces of many statistical models are (semi-) algebraic sets. The geometry of such possibly non-smooth sets can be studied using tools from algebraic geometry. Drawing heavily on these tools, research on general theory for non-smooth statistical models would address topics such as asymptotics for estimators and test statistics when the true parameter point is a singularity, and approximations to Bayesian integrals as in the Bayesian information criterion. Research on new statistical methods might focus on the following approach. Consider as an example a phylogenetic tree model for DNA data. The parameter space of this model is a parametrically described subset of a probability simplex, but it can also be described by systems of equality and inequality constraints. Estimators of the equality constraints, also termed invariants, may then be used to build statistics for testing the goodness-of-fit of the phylogenetic tree model. Many problems in computational biology; e.g., sequence alignment, can be described within this framework, as observed. This is where algebraic statistics joins algebraic biology as a new methodology for solving problems in systems biology. An excellent example of this synthesis is the much-cited book Algebraic Statistics for Computational Biology by Pachter and Sturmfels.

2.3 Related Activities and Programs

As described, the use of algebraic methods in systems biology and statistics is quite recent. Despite this fact there have been a number of meetings devoted to both application areas, in the U.S. and abroad. Events that have helped crystallize the field are:

• Workshop on Computational Algebraic Statistics, AIM, Dec 2003 (http://www.aimath.org/ ARCC/workshops/compalgstat.html);

• Workshop on Geometry, Algebra, and Phylogenetic Trees, Harvey Mudd College, 2004

232

• First International Conference on Algebraic Biology, Tokyo, Dec. 2005; organizers: H. Anai and K. Horimoto(http://www.ab2005.jp);

• Clay Mathematics Institute Workshop on Algebraic Statistics and Computational Biology, November2005; organizers: L. Pachter, B. Sturmfels, S. Sullivant, J. Carlson(http://www. claymath.org/programs/cmiworkshops/ascb/)

• The First Argentine School on Mathematics and Biology, Cordoba, Argentina, 2005 (Laubenbacher and Pachter were plenary speakers). The second school will take place in June 2007.

• AMS Special Session on Algebraic Statistics: Theory and Practice, AMS-MAA- SIAM Joint Meetings, San Antonio,TX,2006; organizers: E. Allman and S.Sullivant; (http://www.math.harvard.edu/ seths/AMS.html);

• Summer School on Algebraic Statistics, Tropical Geometry, and Computational Biology, Nordfjordeid, Norway, June 2006(lecturers Pachter, Sturmfels, Sullivant); (http://www.math.uio. no/div/nordfjordeid/astg.html)

• MSRI Summer Workshop on Mathematical Methods in Computational Biology, MSRI, Berkeley,June2006; organizers: R. Laubenbacher and L. Pachter (http://www.msri.org/calendar/ workshops/WorkshopInfo/330/show workshop);

• IMA Special Year on Applications of Algebraic Geometry 2006-2007; organizers: D. Bertsimas,

• P. Parillo, M. Stillman, B. Sturmfels, M. Sudan, R. Thomas (http://www.ima.umn.edu/20062007/).

• Second International Conference on Algebraic Biology, Linz, July 2007; organizers: conference chairs: H. Anai, B. Buchberger, H. Hong, K. Horimoto (http://www.risc.uni-linz.ac.at/about/ conferences/ab2007/)

• Program onPhylogenetics, Isaac Newton Institute for Mathematical Sciences, Sept.-Dec. 2007 (http://www.newton.cam.ac.uk/programmes/PLG/).

The IMA program included a small component on algebraic statistics, and applications to biology were included in one of the workshops. In total, these events are helping to create a community of researchers working on these topics, both senior as well as postdoctoral faculty and graduate students. They have produced a body of techniques and a collection of driving problems. The field is rich in bright young enthusiastic people full of creative ideas. The time is ripe for a year-long program that will help focus and define the field in a way that week-long conferences and workshops are unable to do. It can play a role similar to the tremendous boost provided to computational algebra and algebraic geometry by the MSRI semester Fall 1998 on

233 Symbolic Computation, organized by M.-F. Roy, M. Singer, and B. Sturmfels. Bringing together theoreticians with potential end-users in biology and statistics in a substantive way over an extended period of time will help further crystallize the key problems to be solved. Another important aspect of the program is cultural. Algebraists, geometers, biologists and statisticians need to develop a common language. While several of the mathematicians involved in this research area already have close collaborations with people in other fields, many others do not. A year-long program provides this opportunity.

2.4 Participants

Tentative Organizing Committee

• Peter Beerli, School of Computational Sciences and Department of Biological Sciences, Florida State University(http://people.scs.fsu.edu/ beerli/about.html)

• Andreas Dress, Director, CAS-MPG Partner Institute for Computational Biology, Shanghai (http://www.picb.ac.cn/director1.htm)

• Mathias Drton, Department of Statistics, University of Chicago (http://galton.uchicago.edu/drton/)

• Ina Hoeschele, Department of Statistics, Virginia Tech, and Virginia Bioinformatics Institute (https://www.vbi.vt.edu/article/articleview/277)

• Christine Heitsch, School of Mathematics, Georgia Tech (http://www.math.gatech.edu/heitsch/)

• Serkan Hosten, Department of Mathematics, San Francisco State University (http://math.sfsu.edu/serkan/)

• Reinhard Laubenbacher (Committee Chair), Department of Mathematics, Virginia Tech, and Virginia Bioinformatics Institute (https://www.vbi.vt.edu/article/articleview/74)

• Bud Mishra, Departments of Computer Science, Mathematics, and Cell Biology, Courant Institute, NYU (http://cs.nyu.edu/mishra/)

• Don Richards, Department of Statistics, Pennsylvania State University (http://www.stat.psu. edu/people/faculty/richards.html)

• Seth Sullivant, Department of Mathematics, Harvard University (http://www.math.harvard.edu/seths/)

• Brett Tyler, Department of Plant Pathology and Weed Science, Virginia Tech, and Virginia Bioinformatics Institute (https://www.vbi.vt.edu/article/articleview/78)

234

• Ruriko Yoshida, Department of Statistics, University of Kentucky (http://www.ms.uky.edu/ ruriko/)

The proposed organizing committee consists of researchers all of whom have contributed to the featured research areas (see references) and collectively brings broad comprehensive expertise to the program. The committee is a balanced mix of mathematicians, statisticians, and scientists involved in biological data generation.

Partial List of Potential Program Participants

In addition to the members of the organizing committee and the local participants listed below, the following list is representative of the long-and short-term participants we expect in the program: • Tatsuya Akutsu, Bioinformatics Center, Kyoto University • Elizabeth Allman, Department of Mathematics, University of Alaska • Wicher Bergsma, Department of Statistics, London School of Economics • Joseph Blitzstein, Department of Statistics, Harvard University • , Microsoft, Cambridge, UK • Yuguo Chen, Department of Statistics, University of Illinois • Elena Dimitrova, Departments of Mathematics and Genetics, Clemson University • Vanja Dukic, Department of Health Sciences, University of Chicago • Martin Feinberg, Department of Chemical Engineering, Ohio State University • Stephen Fienberg, Department of Statistics, Carnegie-Mellon University • Abdul Jarrah, Virginia Bioinformatics Institute, Virginia Tech • , Department of Statistics, Oxford University • Hao Li, Department of Biochemistry, University of California, San Francisco • Jim Lund, Department of Biology, University of Kentucky • Oliver Mason, The Hamilton Institute, Ireland Chris Meek, Microsoft • Lior Pachter, Department of Mathematics, UC Berkeley • Carla Piazza, Department of Computer Science, University of Udine, Italy • Giovanni Pistone, Department of Mathematics, Politecnico Torino • Olivier Pourquie, Stowers Institute • Alberto Policriti, Department of Computer Science, University of Udine, Italy • John Reinitz, Department of Applied Mathematics and Statistics, SUNY, Stony Brook • Eva Riccomagno, Department of Statistics, University of Genova • Thomas Richardson, Department of Statistics, University of Washington • Ilya Shmulevich, Institute for Systems Biology • Meera Sitharam, Department of CISE, University of Florida • Alexandra Slavkovic, Department of Statistics, Penn State University • Eduardo Sontag, Department of Mathematics, Rutgers University • Russell Steele, Department of Statistics, McGill University • Brandilyn Stigler, Mathematical Biosciences Institute, Ohio State University • Michael Stillman, Department of Mathematics, Cornell University

235 • , Department of Mathematics, UC Berkeley • Carolyn Talcott, SRI International • Ren Thomas, Center for Nonlinear Phenomena and Complex Systems, University of Brussels, Belgium • Martin Wainwright, Department of Statistics/EECS, UC Berkeley) • Nanny Wermuth, Department of Biostatistics, Chalmers/Gothenburg • Bin Yu, Department of Statistics, UC Berkeley

2.5 Local Connections

The proposed program will be able to leverage considerable expertise in symbolic computation, systems biology, and statistics at the three local universities. The NCSU Mathematics Department has an extremely strong research group in algebra and symbolic computation, including Hoon Hong (one of the organizers of Algebraic Biology 2007), Erich Kaltofen, Michael Singer, Agnes Szanto, and others, who have expressed interest in participating in the program. The programming also will attract several other researchers in the NCSU, Duke, and UNC Mathematics Departments who work in algebra, algebraic geometry, combinatorics, and mathematical biology. There are several local researchers working in systems biology, including Philip Benfey (Biology, Duke), Morgan Giddings (Bioengineering, UNC), Alexander Hartemink (Computer Science, Duke), and other researchers affiliated with the newly founded Duke University Institute for Genome Sciences and Policy. Potential participants in statistics include Ian Dinwoodie (Statistics,Duke), Asger Hobolth (NCSU, Stat and Bioinformatics Research Center), Mark Hubert (Statistics/Mathematics, Duke), Eric Stone (Statistics, NCSU), Jeff Thorne (Statistics and Bioinformatics Research Center, NCSU), Mike West (Statistics, Duke), and Zhao-Bang Zeng (Statistics, NCSU) This partial list of faculty whose research is related to the proposed program makes clear that there will be considerable synergy between program participants and local researchers. We are planning for several of them to participate in the organization of program activities.

2.6 Proposed Program

We propose the following main activities for the year, with a tentative list of organizers:

• Opening workshop (This workshop will include mathematicians, statisticians, and biologists and will tentatively take place in early September 2008.);

• Workshop on Modeling and Simulation of Biochemical Networks, organized by R. Lauben-bacher, B. Mishra, and others (October 2008);

• Workshop on Statistical and Mathematical Challenges of Biological Data Sets, organized by B. Tyler, Virginia Tech, I. Hoeschele, Virginia Tech, P. Beerli, Florida State U., and others (November 2008)

236

• Workshop on Algorithm and Software Development for Applications in Biology and Statistics, organized by A. Jarrah, Virginia Bioinformatics Institute, M. Stillman, Cornell University, R. Yoshida, University of Kentucky, and local faculty (December 2008);

• Workshop on Singularities in Bayesian Statistics, organized by R. Steele, McGill University, and B. Sturmfels, UC Berkeley (February 2009);

• Workshop on Algebraic and graphical statistical models, organized by M. Drton, University of Chicago, and others (March 2009);

• Workshop on Molecular Clocks, organized by L. Pachter, UC Berkeley, and others (April 2009)

• Transition workshop (June 2009);

• Working group on computer algebra methods in algebraic biology, organized by local faculty;

• Working group on phylogenetics, organizers TBD.

In addition to these activities we are planning tutorials, as well as lecture series by program participants and local faculty. For instance, Laubenbacher will give a 2-week lecture course on Algebraic models in systems biology during fall 2008, in preparation for the October workshop. In preparation for the November workshop, participants in residence will have the opportunity to learn about data generation in biology either in local laboratories of participating biology faculty or during a visit to the Virginia Bioinformatics Institute Core Laboratory Facility in Blacksburg, VA.

237 3. Statistics and Dynamics of Neuronal Activity (Scheduling will, in part, depend on budgetary considerations)

3.1 Background

Rob Kass (Stat, CMU) first brought up the idea of a SAMSI program on neuronal activity with Peter Bickel during a visit to Berkeley. The idea had arisen in conversations he had had with Emery Brown (Harvard Medical School). Further discussions have been held that have defined the program more precisely, including with Emery Brown during his visit to Duke in September, 2006.

3.2 Introduction

Dynamical models of the neuron date back to the pioneering work of Hodgkin-Huxley in the 1950s. These models have contributed enormously to our basic understanding of the functioning of nerve cells and their ability to produce and propagate an action potential, synchronize between cells, inhibit firing and many other neuronal events. The analysis of these models has benefited greatly from developments in dynamical systems that emphasize geometric and qualitative features of models. A substantial community in applied mathematics and biophysics has been involved in this research for many years (e.g., Ermentrout and Cowan, 1979; Koch, 1999; Kopell and Ermentrout, 1986; Kopell and Ermentrout, 2002; Rinzel and Ermentrout, 1998).

More recently, statistical problems of fitting neuronal data have received considerable attention (e.g., Brown et al., 1998; Brown, Kass, and Mitra, 2004; Kass, Ventura, and Brown, 2005). The recent work has emphasized use of point process methodololgy in analyzing sequences of action potentials, or spike trains. There is now an identifiable sub-field of statistical analysis of neuronal data, exemplified by three international workshops devoted to this subject. (SAND3, the third workshop on Statistical Analysis of Neuronal Data, was held May 11-13, 2006, in Pittsburgh, with Emery Brown and Rob Kass as primary organizers.)

3.3 Proposed SAMSI Activity

This proposal is for a dedicated program in the area of neuronal modeling and data analysis in 2008-9. It will have the explicit goal of bringing together these two scientific strands. A SAMSI program offers the ideal framework for such an initiative as the program fits exactly with its mission to match statistical and applied mathematical developments. Moreover, the brainstorming format of a SAMSI program is the perfect vehicle for finding common ground for these two approaches. The program will be initiated by a planning workshop in fall, 2007 that will assess the area and outline a plan for the subsequent program.

There is considerable activity in this area in Pittsburgh and one suggestion is that we run a companion research activity node there. This model has been established in the High Dimensional Inference and Random Matrices program at SAMSI with a companion node at

238 Berkeley. Video and web conferencing have successfully been used to integrate a large group at Berkeley into the program activities.

3.4 Scientific Case for the Program

There is potentially enormous benefit to be gained from bringing these two points of view together and seeing how the two perspectives might inform each other. With the physical information that the dynamical models carry, they offer the development of improved statistical models if they can be directly incorporated. For instance, the shape of the action potential can be captured in a dynamical model and its influence on firing could be used critically in the design of statistical models for the firing rate. On the other hand, the deterministic models only reflect the reality of neuronal spiking in a qualitative way and the refinement of the models based on an incorporation of data through statistical methods may lead to a deeper understanding of the models and their range of applicability. This may occur through parameter estimation, as a first step, or more complicated approaches to the assimilation of data into the models and incorporation of noise effects based on systematic statistical procedures. A further benefit may be the extension of statistical analyses to the context of multiple interacting neurons and networks which have been studied extensively through dynamical models.

To date there has been little communication between the mathematical and statistical communities involved in modeling spikes and spike train data. Proposed is a dedicated program in the area of neuronal modeling and data analysis with the explicit goal of bringing together these two scientific strands. The program will be initiated by a planning workshop that will assess the area and outline a plan for the subsequent program.

Interactions between mathematicians and statisticians working with neuronal phenomena have the potential for huge benefits. On the one hand, statistical models could take advantage of detailed mathematical models, and thereby become more effective in guiding scientific inferences based on data. For instance, the shape of the action potential can be captured in a dynamical model and its influence on firing could be used critically in the design of statistical models for the firing rate.

On the other hand, the deterministic models only reflect the reality of neuronal spiking in a qualitative way and the refinement of the models based on an incorporation of data through statistical methods may lead to a deeper understanding of the models and their range of applicability. Statistical methods may be very helpful in parameter estimation, i.e., the fitting of differential equations to data in the presence of noise, and systematic lack of fit, including representation of appropriate uncertainty. A further benefit may be the extension of statistical analyses to the context of multiple interacting neurons and networks which have been studied extensively through dynamical models.

These two approaches, through the use of statistical and dynamical systems techniques respectively, have had enormous impact on neuroscience, and their interaction promises a rich and significant enhancement of the power of each approach. This workshop, and subsequent program, will build on this promise and consider ways in which this idea might be realized through targeted and concrete research efforts.

239

3.5 Program Leaders

Program leaders include: • Rob Kass (Statistics, Carnegie-Mellon) • Emery Brown (Harvard Medical School and MIT Health Sciences) • Satish Iengyar (Statistics, Pittsburgh) • Bard Ermentrout (Mathematics, Pittsburgh) • Nancy Kopell (Center for BioDynamics and Mathematics, Boston) • Jon Rubin (Mathematics, Pittsburgh).

3.6 Participants

This is an area of enormous current interest and it is anticipated that many people from the biomathematics and statistics communities will be interested.

3.6 Planning Workshop

A workshop will be held in Pittsburgh during the fall, 2007 that will lay the groundwork for the program. A meeting was held in Pittsburgh on February 9, 2007 at which plans were discussed for this meeting.

The format will be based around two main topics, (1) Integrate-and-fire models and (2) Ion channels. For each of these we will have three speakers: a mathematician, a statistician, and a neurophysiologist.

1. Integrate-and-fire:Gabriela Czanner (Harvard), André Longtin (Ottawa), and John White (BU) 2. Ion Channels: Jonathan Cohen (Harvard), Fred Sigworth (Yale), and Frank Ball (Nottingham)

In addition, we will have lectures on other topics, especially having to do with (1) imaging and (2) motor function.

1. Imaging: Larry Liebovich, Seong-Gi Kim (Pitt), Riera (Japan/Cuba), David Boas (MIT), Richard Leahy (USC-EE), and Steve Smith (Oxford) 2. Motor control: Reza Shadmehr (Johns Hopkins), Emo Todorov (UCSD), Andy Schwartz, Maurice Smith (Harvard), and Krishna Shenoy (Stanford)

Other contributor could include: Lorin Milescu (NIH), Sven Zenker (Pitt), Larry Biegler (Carnegie Mellon), Hulin Wu (Rochester), and Jim Ramsay (McGill).

240

References:

• Brown, E.N., Frank, L.M., Tang, D., Quirk, M.C. and Wilson, M.A. (1998) A statistical paradigm for neural spike train decoding applied to position prediction from ensemble firing patterns of rat hippocampal place cells. Neuroscience, 18: 7411--7425.

• Brown, E.N., Kass, R.E., and Mitra, P.N. (2004) Multiple neural spike trains analysis: state- of-the-art and future challenges. Nature Neuroscience, 7: 456--461.

• Ermentrout, G.B.; Cowan, J.D. (1979) Temporal oscillations in neuronal nets. J. Math. Biol. 7: 265-280.

• Kass, R.E., Ventura, V., and Brown, E.N. (2005) Statistical issues in the analysis of neuronal data, J. Neurophysiology, 94: 8-25.

• Koch, C. (1999) Biophysics of Computation, Oxford.

• N. Kopell and G.B. Ermentrout (2002) Mechanisms of phase-locking and frequency control in pairs of coupled neural oscillators, in Handbook onDynamical Systems: Toward applications. Ed. B. Fiedler, Elsevier 2:3-54.

• Kopell,-N.; Ermentrout,-G.-B., (1986) Symmetry and phaselocking in chains of weakly coupled oscillators. Comm. Pure Appl. Math. 39: 623-660.

• J. Rinzel and G. B. Ermentrout, (1998) Analysis of neural excitability and oscillations, in Methods in Neuronal Modeling, MIT press, Cambridge, MA.

241 Appendix A - Final Program Report: National Defense and Homeland Security

1. Objectives

The SAMSI program on National Defense and Homeland Security achieved its principal purposes of identifying research promising paths for the statistical sciences, applied mathematics and decision sciences in problems of National Defense and Homeland Security (NDHS), and initiating research on them. This effort was especially important because previous efforts by these communities had failed to create a self-sustaining research momentum on NDHS. In addition, there have been few research efforts that had spanned the statistical sciences, the applied mathematical sciences and the decision sciences.

2. Working Groups

Four Working Groups operated throughout the year, whose principal function, as in all SAMSI programs, was to organize the research and ensure communication:

• Agricultural Systems, led by Mette Olufsen, Faculty Fellow (Applied Mathematics, NCSU). Barrett Slenning (College of Veterinary Medicine, NCSU) is providing significant leadership as well.

• Anomaly Detection, led by David Dickey, Faculty Fellow (Statistics, NCSU), and Douglas Kelly, Faculty Fellow (Statistics, UNC).

• Data Confidentiality, led by Lawrence Cox (NCHS) and Alan Karr (NISS).

• Social Networks, led by David Banks (Statistics, Duke).

Each working group had significant external participation. Specifically,

• Greg Rempala and Ryan Gill (University of Louisville) were participants in the Agricultural Systems Working Group, and Michelle Lacey (Tulane University), who was at SAMSI for the fall of 2005, was a regular participant.

• Deepak Agarwal (AT&T Research), Howard S. Burkom (Johns Hopkins University Applied Physics Laboratory), Kevin Ward Drummey (National Security Agency (NSA)), Joe Fred Gonzalez, Jr. (National Center for Health Statistics (NCHS)), Myron Katzoff (NCHS), James Lynch (University of South Carolina), Ted Norminton (Carleton University), Cheolwoo Park (University of Georgia), Henry Rolka (Centers for Disease Control and Prevention (CDC)), and Galit Shmueli (University of Maryland) were regular participants in the Anomaly Detection Working Group.

243

• Lawrence Cox (NCHS), Joe Fred Gonzalez (NCHS) and James Lynch (University of South Carolina, who spent the spring semester at SAMSI) were regular participants in the Data Confidentiality Working Group.

• Deepak Agarwal (AT&T Research), Edoardo Airoli (Carnegie Mellon University), Hugh Chipman (Arcadia University), Stephen Fienberg (Carnegie Mellon University), Myron Katzoff of the NCHS, and Ted Norminton (Carleton University), were regular participants in the Social Networks Working Group.

3. Outcomes

3.1 Program Level

Outcomes at the program level were:

• Significant research accomplishments by the working groups, leading to papers submitted during or after the program year.

• Formation of new collaborations, leading to proposals and research in following years. See §6.2 for an example.

• Extremely positive career impact on participants, especially postdoctoral researchers.

• Strong community interest in the program, leading to engagement in the form of research visits and workshop participation.

3.2 Working Groups

Each working group produced a “sound bite” summary of its plans and identified one or more outcomes that it would consider to be major successes. Not all did so, but each addressed high-level goals. Each working group also developed a detailed research agenda that both focuses energy and defines measures of success.

Agricultural Systems. The group set as a major goal and success identification of state of the art model predicting spread of Foot and Mouth Disease (FMD) that can be applied to study of the disease in North Carolina and the US. Participants from SAMSI and the Research Triangle were Amit Apte (postdoc, UNC), Ping Bai (student, UNC), David Banks (Duke), Thomas Banks (NCSU), Ricardo Cortez (Tulane University; currently housed at UNC), Sava Dediu (postdoc, SAMSI), Anjela Govan (student, NCSU), Christopher Jones (SAMSI and UNC), Michelle Lacey (Tulane University; currently housed at Duke), Michael Last (postdoc, NISS), Hoan Nguyen (postdoc, NCSU), Mette Olufsen (NCSU), Barrett Slenning (NCSU) and Ralph Smith (NCSU). Thrusts of the research included:

244 Background presentation “Agricultural Disasters, Natural or Not: Risks and Readiness” by Slenning. Literature search for mathematical and statistical models for FMD. Review and discussion of items identified by the literature search. An introduction to Jackson networks. As described in §6.1, this group remained active through March of 2007, producing a major multiple-author paper.

Anomaly Detection. A central problem in NDHS is the early detection of events. These might include early signs of a disease outbreak (natural or terrorist induced), imminent disastrous weather events, or any form of terrorist attack. As the nation thinks about strategies for dealing with these issues, the Working Group is trying to surveying statistical methods that might be part of a plan to deal with some of these threats. The group created a web site with descriptions of statistical methods applicable to the kinds of data involved in NDHS issues. Examples would include time series, issues of type I versus type II error tradeoffs, outlier and change point detection. SAMSI-and Research Triangle-based participants were David Banks (Duke), M. J. Bayarri (SAMSI), Gauri Datta (University of Georgia, visiting SAMSI), Lisa R. Denogean (postdoc, SAMSI), David A. Dickey (NCSU), Joyee Ghosh (student, Duke), Shenek Heyward (student, NCSU), Yajun Mei (Fred Hutchinson Cancer Center, visiting SAMSI), Bahjat Qaqish (UNC), Curtis Storlie (postdoc, NCSU/SAMSI) and Francisco Vera (postdoc, NISS and SAMSI). Research thrusts:

• Study of syndromic surveillance, which is roughly described as statistical analysis of indicators from many sources, for example, admissions to several hospitals for flu-like symptoms as well as sales of over-the-counter products related to the flu. One particular technique, the SCAN statistic, was the focus of some effort and some thinking about how to modify it to take into account background effects.

• Review of related statistical topics, such as time series, outlier detection, sequential testing methods, and multiple testing in a Bayesian framework.

• Cross-fertilization of ideas has been very important. To illustrate, Dickey presented a time series of American Airlines stock volume as part of a time series presentation, and Lynch immediately did some additional analyses that were presented to the group. An initially unsuccessful proposal produced by members of this group is discussed in §6.2.

Data Confidentiality. Because of increasing availability of external databases, construction of public release databases by federal agencies is verging on the impossible. A long-term goal is to produce an integrated suite of software tools that customize disclosure limitation strategies to statistical characteristics of databases, prolonging by years the ability of the agencies to release useful microdata. SAMSI-and Research Triangle-based participants were Ping Bai (student, UNC), Lisa R. Denogean (postdoc, SAMSI), Joyee Ghosh (student, Duke), Alan Karr (NISS),

245 Robin Mitra (student, Duke), Anna Oganian (postdoc, NISS), Bahjat Qaqish (UNC), Jerome Reiter (Duke), Saki Kinney (student, Duke), Francisco Vera (postdoc, NISS and SAMSI) and Mi-Ja Woo (postdoc, NISS). Visitors during the spring semester were James Lynch (South Carolina) and Xiaodong Lin (Cincinnati; supported by NISS). Thrusts of the research:

• Development of techniques that combine two or more methods for statistical disclosure limitation (SDL), resulting in disclosure risk and data utility performance superior to either method alone.

• Construction of new utility measures for numerical microdata, based on propensity scores and clustering.

• Exploration of SDL for data satisfying constraints such as positivity. Utility measures that account for analyses of released microdata that accommodate whatever SDL measures were applied.

• Methods and software systems for statistical analysis of distributed databases with actual integration of the data.

• A new formulation of data swapping that randomizes which attributes are swapped as well as which records are selected for swapping.

• As described in §6.1, this working group remains fully active, now as a component of the NISS affiliates program.

Social Networks. Social network statisticians have developed a rich class of static models. But new challenges that have arisen in counterterrorism, communications networks, and disaster response show the need for robust, reliable models that capture network dynamics. A major success would development of a body of theory for dynamic network models and the benchmarking of such models against applications from many different fields. In addition to external participants listed in §2, participants based at SAMSI or in the Research Triangle are David Banks (Duke), Thomas Banks (NCSU), Sava Dediu (postdoc, SAMSI), Chung-Chien Hong (student, NCSU), Alan Karr (NISS), Michael Last (postdoc, NISS), Negash Medhin (NCSU), Hoan Nguyen (postdoc, NCSU) and Eric Vance (student, Duke). Activities of the group: • Development of multiple models that introduce dynamics into social networks.

• Organization of and participation in a Board on Mathematical Sciences and their Applications (BMSA) Workshop on Statistics in Networks (Washington, September 26–27) and DARPA Workshop on Virtual Worlds and Social Networks (Washington, October 18–19).

246

• Review of latent space models and ideas for generalization to dynamic network models, of optimization methods for dynamic networks, of estimation of homophily issues and of dynamic process models.

• Application of network models to Hurricane Katrina evacuee surveys.

• Preparation of a research proposal on network models, collaboratively with AT&T Labs, which will allow models to be tested on real data.

• Planning and preparation of papers for a special issue of Computational Mathematics and Organizational Theory. Main research directions: • Development of a framework for characterizing behavior of dynamic social network models, leading to a paper that has been accepted for publication (Banks et al., 2005).

• Optimization models for social networks (Hong and Medhin, 2006 a,b).

• Testing multiple dynamic network models on data from Enron, AT&T and Hurricane Katrina.

• Assessing agent-based models vis-à-vis dynamic versions of p models (Vance et al., 2006).

4. Workshops

4.1 Kickoff Workshop

The September 11–14, 2005 kickoff workshop for the program attracted more than 100 attendees, and met the stated goal of informing the composition and activities of the Working Groups. Details of the program are at www.samsi.info/workshops/2005ndhs- workshop200509.shtml.

4.2 Mid-Program Workshops

Three of the working groups held off-site mid-program workshops, which were extremely successful in attracting attendees who are not working group participants, as well as bringing together in-Triangle and out-of-Triangle participants. Specifics are as follows:

Anomaly Detection. The workshop was held on February 3, 2006 at the National Center for Health Statistics in Hyattsville, MD. Attendance was approximately thirty. Keynote presentations were made by Donald E. Brown (Systems and Information Engineering, University of Virginia), Howard S. Burkom (National Security Technology Department, Johns Hopkins Applied Physics Laboratory) and Carey E. Priebe (Applied Mathematics

247 and Statistics, Johns Hopkins University). Working group participants making presentations were Gauri Datta, David Dickey, Ryan Gill, Michael Last and Francisco Vera. Full details are available at www.samsi.info/200506/ndhs/workinggroup/ad/Workshop.htm.

Data Confidentiality. This workshop was held on March 13, 2006, also at the National Center for Health Statistics in Hyattsville, MD, and attracted more than 45 attendees, including researchers from the Bureau of Labor Statistics, Census Bureau, Energy Information Administration, National Center for Education Statistics and National Center for Health Statistics. Principal presentations were by Lawrence Cox (NCHS), Jay Kim (NCHS) and Aleksandra Slavkovic (Penn State). Presentations on research conducted by the working group were made by Lisa Denogean (SAMSI), Anna Oganian (NISS), Mi-Ja Woo (NISS) and Francisco Vera (NISS and SAMSI). James Lynch and Jerome Reiter led discussion sessions. Full details are available at www.samsi.info/200506/ndhs/workinggroup/dc/workshops/midyear workshop.htm.

Social Networks. The workshop was held on March 2, 2006 at Carnegie Mellon University, the principal location of non-Triangle participants in the working group. Attendance was approximately 25, including Canadian participants Shirley Mills and Ted Norminton. Presentations were made by Alan Karr (NISS), Purnamrita Sarkar (CMU), Anna Goldenberg (CMU), Eric Xing (CMU), Eric Kolaczyk (), Steve Henneke (CMU) and Eric Vance (Duke University).

4.3 Transition Workshop

An informal transition Workshop for the NDHS program was held in Research Triangle Park, NC, in October, 2006, in conjunction with the Army Conference on Applied Statistics (ACAS), which was be hosted by NISS. Three sessions at the ACAS contained presentations resulting from the NDHS program, informing the nearly 100 attendees about the achievements of the program.

5. Education and Outreach

5.1 Education and Outreach

A seminar course on National Defense and Homeland Security took place in the fall of 2006 at SAMSI, led by Alan Karr. Eight students were enrolled.

5.2 Undergraduate Workshop

An undergraduate workshop based on the National Defense and Homeland Security program took place on March 3–4, 2006. There were approximately 30 attendees, one- half of them women. Presentations were made by David Banks, Negash Medhin, Hoan Hguyen, Eric Vance, Ping Bai, Sava Dediu, Anjela Govan, Lisa Denogean, Francisco Vera, David Dickey, and Alan Karr. Details are available at www.samsi.info/workshops/2005ug-workshop200603.shtml.

248 6 Follow-On Activities

6.1 Continuing Research

The Agricultural Systems Working Group remained active through March, 2007, producing a paper entitled “Stochastic and Deterministic Models for Agricultural

Production Networks” by P. Bai, H. T. Banks, S. Dediu, A. Y. Gova n, M. Last, A. Lloyd, H. K. Nguyen, M. S. Olufsen, G. Rempala, and B. D. Slenning, which has been submitted to Mathematical Biosciences and Engineering (Bai et al., 2007). (NDHS program roles: P. Bai = NCSU graduate student, H. T. Banks = NCSU faculty participant, S. Dediu = SAMSI postdoc, A. Y. Govan = NCSU graduate student, M. Last = NISS postdoc, A. Lloyd = NCSU faculty fellow, H. K. Nguyen = SAMSI postdoc, M. S. Olufsen = NCSU faculty fellow and working group leader, G. Rempala = visitor from the University of Louisville, and B. D. Slenning = NCSU faculty participant.)

The Data Confidentiality Working Group became a working group of the NISS Affiliates Program, and continues to meet weekly. It is led by Alan Karr, directorate liaison to the NDHS program. Program participants active in it include L. Cox (NCHS), J. F. Gonzalez (NCHS), A. Karr (NISS), M. Katzoff (NCHS), J. Reiter (Duke) and F. Vera (Clemson). Multiple publications catalyzed by the SAMSI working group were completed Denogean et al. (2006); Ghosh et al. (2006); Karr et al. (2006b,a); Oganian and Karr (2006); Sanil et al. (2004); Woo et al. (2006).

6.2 FRG Proposal

Several members of he Anomaly Detection Working Group collaborated on a proposal “Bayesian Methods in Syndromic Surveillance: CAR Models and Computational Implementation” submitted to the Focused Research Groups program at NSF/DMS in September, 2006. Unfortunately, the proposal was declined, but alternative sources of funds are being pursued. PIs on the proposal were David Banks (Duke), Gauri Datta (Georgia), Alan Karr (NISS), James Lynch (South Carolina) and Francisco Vera (Clemson).

References

• Bai, P., Banks, H. T., Dediu, S., Govan, A. Y., Last, M., Lloyd, A., Nguyen, H. K., Olufsen, M. S., Rempala, G., and Slenning, B. D. (2007). Stochastic and deterministic models for agricultural production networks. Mathematical Biosciences and Engineering. Submitted for publication.

• Banks, H. T., Karr, A. F., Nguyen, H. K., and Samuels, Jr., J. R. (2005). Sensitivity to noise variance in a social network dynamics model. Q. Appl. Math. To appear. Available on-line at www.samsi.info/reports/index.shtml.

249 • Denogean, L. R., Karr, A. F., and Qaqish, B. F. (2006). Model-based utility of doubly random swapping. Technical report, National Institute of Statistical Sciences.

• Ghosh, J., Reiter, J. P., and Karr, A. F. (2006). Secure computation with horizontally partitioned data using adaptive regression splines. Computational Statist. and Data Anal. To appear.

• Hong, C.-C. and Medhin, N. G. (2006a). A nonlinear programming approach to the study of social networks. Neural Parallel and Scientific Computation. Submitted for publication.

• Hong, C.-C. and Medhin, N. G. (2006b). Positive and negative affinities model for social networks. Neural Parallel and Scientific Computation. Submitted for publication.

• Karr, A. F., Fulp, W. J., Lin, X., Reiter, J. P., Vera, F., and Young, S. S. (2006a). Secure, privacy-preserving analysis of distributed databases. Technometrics. To appear.

• Karr, A. F., Lin, X., Reiter, J. P., and Sanil, A. P. (2006b). Secure analysis of distributed databases. In Olwell, D., Wilson, A. G., and Wilson, G., editors, Statistical Methods in Counterterrorism: Game Theory, Modeling, Syndromic Surveillance, and Biometric Authentication, pages 237– 261. Springer–Verlag, New York.

• Oganian, A. and Karr, A. F. (2006). Combinations of SDC methods for microdata protection. In Domingo-Ferrer, J. and Franconi, L., editors, Privacy in Statistical Databases 2006, volume 4302 of Lecture Notes in Comp. Sci., pages 102–113. Springer–Verlag, Berlin/Heidelberg.

• Sanil, A. P., Karr, A. F., Lin, X., and Reiter, J. P. (2004). Privacy preserving analysis of vertically partitioned data using secure matrix products. J. Official Statist. Submitted for publication. Available on-line at www.niss.org/dgii/technicalreports.html.

• Vance, E., Archie, E. A., and Moss, C. J. (2006). Social networks in African elephants. J. Computational and Mathematical Organization Theory. Submitted for publication.

• Woo, M.-J., Reiter, J. P., Oganian, A., and Karr, A. F. (2006). Global measures of data utility for microdata masked for disclosure limitation. J. Privacy and Confidentiality. Submitted for publication. Available on-line at www.niss.org/dgii/technicalreports.html.

250 Appendix B – Final Program Report: Astrostatistics

1. Introduction

The principal purpose of the SAMSI program on Astrostatistics is to identify promising research paths for statistical sciences and applied mathematics in problems of observational astronomy, astrophysics and particle physics, and to initiate research on these problems. Astrostatistics is a growing field of collaborative researchers who specialize in identifying and developing statistical methods for astronomical research. The problems in astronomy are unique in some way and require new methodologies to analyze massive data streaming from large and federated sky surveys. The program was organized in collaboration with the Center for Astrostatistics (CASt) (http://astrostatistics.psu.edu) at Penn State. Indeed, the recent Statistical Challenges in Modern Astronomy IV conference, held in June 2006, was the closing workshop for the SAMSI Astrostatistics program.

2. Working Group Activities

2.1 Personnel and Program Organization

Jogesh Babu (CASt, Penn state) who was in residence at SAMSI during January -May 2006 led the program. Four Working Groups were formed, whose principal functions, as in all SAMSI programs, were to organize the research and ensure communication. In addition, two Intensive Research Sessions were organized. These groups functioned for the entire Spring Semester of 2006. The majority of participants were from outside the Triangle area. All the groups had representation from at least two different fields. Some of these working groups shared some common statistical issues and the groups interacted closely in such cases. Exoplanets working group led by Bill Jefferys (Universities of Texas and Vermont) and Merlise Clyde (Duke University) included: Jogesh Babu (Pennsylvania State University, Department of Statistics), Susie Bayarri (University of Valencia, Department of Statistics and Operations Research), Jim Berger (Duke University, ISDS), Floyd Bullard (Duke University, ISDS), David Chernoff (Cornell University, Department of Astronomy), Pablo de la Cruz (University of Valencia, Observatori Astronomic), Gauri Datta (University of Georgia, Department of Statistics), Peter Driscoll (San Francisco State University), Eric Feigelson (Pennsylvania State University, Department of Astronomy & Astrophysics), Phil C. Gregory (University of British Columbia, Department of Physics and Astronomy), Eric Ford (University of California, Berkeley, Department of Astronomy), Tom Jefferys (Unaffiliated), Michael Last (NISS), Hyunsook Lee (Pennsylvania State University, Department of Statistics), Jaeyong Lee (Seoul National University, Department of Statistics), Tom Loredo (Cornell University, Department of Astronomy), Barbara McArthur (University of Texas at Austin, Department of Astronomy), Raman Narayan (San Francisco State University). Surveys and Population Studies working group led by Tom Loredo (Cornell University) included: Jogesh Babu (Penn State University), Ruth Barrera (National

251 University of Colombia), Brendon Brewer (University of Sydney), Alanna Connors (Eureka Scientific), David Chernoff (Cornell University), Pablo de la Cruz (Universitat de Valencia), Gauri S. Datta (University of Georgia), Matthew Fleenor (University of North Carolina), Martin Hendry (University of Glasgow), Woncheol Jang (Duke University), Kristofer Jennings (Purdue University), Chunglee Kim (Northwestern University), Hyunsook Lee (Penn State University), Kuo-Ping Li (University of North Carolina), Ji Meng Loh (Columbia University, Tom Loredo (Cornell University), Vicent Martinez (Universitat de Valencia), Francisco Vera (NISS), Martin Weinberg (University of Massachusetts, Amherst), Haywood Smith (University of Florida). Source Detection and Feature Detection working group led by David van Dyk (University of California, Irvine) included: Keith Arnaud (NASA, Goddard Space Flight Center), Jim Chiang (GLAST), Alanna Connors (Eureka Scientific), Peter Freeman (CMU), Jiashun Jin (Purdue University), Vinay Kashyap (Smithsonian Astrophysical Observatory), Taeyoung Park (Harvard University), Adam Roy (UC Irvine), Jeff Scargle (NASA, Ames Research Center), Aneta Siemiginowska (Smithsonian Astrophysical Observatory), Alex Young (NASA Goddard Space Flight Center), Yaming Yu (University of California, Irvine), Jogesh Babu (Penn State University), Lingsong Zhang (University of North Carolina), Woncheol Jang (Duke University), Rebecca Willett (Duke University), Eric Feigelson (Penn State), Xiao-Li Meng (Harvard University), Thomas Lee (Colorado State University). Gravitational Lensing working group led by Arlie Petters (Duke) included: Charles Keeton (Rutgers University), Christopher Genevese (CMU), Jogesh Babu (Penn State), Ji Meng Loh (Columbia), Brian Rider (Colorado), Nicholas Robbins (Duke, Grad Student), Francisco Vera (NISS, Postdoc), Liliya Williams (Minnesota), Yaming Yu (U. C. Irvine), Zhengyuan Zhu (UNC). Intensive Session on Statistical Issues in Particle Physics led by Louis Lyons (Oxford, UK) met in March, with heavy emphasis during March 6-16. The members include: Michael Woodroofe (University of Michigan), Kyle Cranmer (Brookhaven Lab), Jim Linnemann (Michigan State), Nancy Reid (University of Toronto), Luc Demortier (Rockefeller University), Joel Heinrich (U. Penn), Giovanni Punzi (Scuola Normale Superiore and INFN), Harrison Prosper (Florida State), Pushpa Bhat (Fermi Lab), Bodhi Sen (University of Michigan), Jogesh Babu (Penn State), John Hartigan (Yale University), Hyunsook Lee (Penn State). Intensive Session on Stellar Evolution led by Bill Jefferys (Universities of Texas and Vermont) met during February 20-23. The members include: Ted von Hippel (University of Texas), Steve DeGennaro (University of Texas), Elizabeth Jeffery (University of Texas), Nathan Stein (University of Texas), David van Dyk (University of California, Irvine), Tom Loredo (Cornell), Theodore Arthur Sande (MIT).

2.2. Achieving Diversity

Three African American researchers were central to the program. Harrison Prosper was a major participant in the Statistical Issues in Particle Physics intensive session, Arlie Petters was the leader of Gravitational Lensing working group, and Don Richards was on the overall program leaders committee.

252 Participation of women was also extensive. Merlise Clyde was the co-leader of the Exoplanets working group. Susie Bayarri and Barbara McArthur (who gave a keynote presentation at the kickoff meeting) also participated in this group. Ruth Stella Barrera Rojas and Alanna Connors – who was also on the overall program leaders committee – participated via teleconference in Surveys and Population studies and Source and Feature Detection working groups. A graduate student Hyunsook Lee participated in all the working groups. In addition, Rebecca Willett, Aneta Siemiginowska, and Ramani Pilla participated in the Source and Feature Detection working group; Nancy Reid and Pushpa Bhat participated in the Statistical Issues in Particle Physics intensive session; and Elizabeth Jeffery participated in the Stellar Evolution intensive session. Two other women, Megan Sosey and Fabrizia Guglielmetti were involved in the planning meeting for the program.

2.3. Research

Each Working Group had regularly scheduled meetings/teleconferences throughout the program period; most groups also scheduled one or more intensive working sessions that brought many members to SAMSI for face-to-face collaboration. Each Working Group, as well as the two separate intensive sessions on particle physics and stellar evolution, developed a detailed research agenda and followed through.

2.4. Exoplanets

The Exoplanets working group began its work with a two week intensive session immediately following the January 2006 opening workshop, with many group members in residence at SAMSI for this period. The main goals for this session were to describe the observing processes and planet orbit models in mathematical detail, and to survey existing analysis methods. The many group meetings held during this session quickly got the statisticians “up to speed” on exoplanet modeling, and also allowed astronomers who had been working separately on these problems to learn about the details of each others’ approaches. Fortuitously, this period coincided with the announcement of the discovery of an exoplanet by gravitational lensing, so this new observational technique was also briefly covered. Eric Ford and Barbara MacArthur played particularly key roles in the early meetings, Ford due to his collaboration with a leading observing team using the radial velocity method (the California/Carnegie collaboration), and MacArthur due to her participation in HST astrometric exoplanet observations (with Bill Jefferys). The intensive session was a great success in establishing a common foundation between statisticians and astronomers in the group. This allowed the subsequent weekly group meetings to immediately focus on important technical issues. The initial meetings focused on model specification (including optimal parameterization), prior selection (establishing a common set of default priors so investigators could better compare results) and MCMC methods for fitting models to radial velocity data. On the last topic, Eric Ford and Phil Gregory made detailed presentations on two very different approaches: Ford described random walk Metropolis samplers, while Gregory described a parallel tempering algorithm, innovative in its use of a feedback control system to tune sampler

253 parameters. MacArthur also described a simpler nonlinear fitting approach. Statisticians helped in improving model parameterizations and MCMC output diagnostics. The presentations enabled Floyd Bullard to quickly write his own exoplanet MCMC algorithm from scratch (based mostly on Ford’s approach). As part of the working group’s effort to create a test bed of problems, Ford obtained actual data sets (from Geoff Marcy) for two systems with strong planet signals, for comparing methods and testing code. A third challenging data set was obtained by Gregory (from Chris Tinney), which produces significantly multimodal fits. These data sets were analyzed independently by Ford, Gregory, Bullard and McArthur, and the results compared. An important result of that effort was heightened realization of the importance of good parameterizations of models to improve MCMC performance, especially for fitting low-eccentricity orbits (where parameter identifiability issues arise). Statisticians helped astronomers greatly with both this issue, and with improving MCMC output diagnostics. Finally, the shared data and default priors enabled identification of bugs in existing code by allowing more detailed comparison of results than had been possible before. The early work just described focused on getting sound solutions to the estimation problem of finding what orbits fit a data set from a system known to host a planet. Most of the remaining effort of the working group was devoted to finding good methods for detection, i.e., for comparing models without a planet to those with one or more planets. In the Bayesian framework that the community has adopted for these problems, detection requires calculation of Bayes factors, i.e., ratios of the marginal likelihoods of competing models (likelihoods averaged over all model parameters). Besides being of interest for detecting planets in a particular system, marginal likelihoods play a key role in combining results from many systems to infer population properties, and in using existing data from a system to adaptively schedule subsequent observations. Loredo presented his work with Chernoff on adaptive scheduling to further motivate these calculations. Indeed, this is a key motivation for the group’s work. Exoplanet observations are resource- intensive; adaptive scheduling promises to significantly improve observing efficiency for both planet detection and orbit estimation. But the calculations so far have proved daunting. Marginal likelihood calculation is essentially a multidimensional integration problem. Group members worked closely together to explore a wide variety of marginal likelihood calculation algorithms, some of them based on existing multidimensional integration (“cubature”) methods, but several being innovative. The methods are described in reports by Ford, Clyde, and Loredo. Most of the existing methods that were studied were not successful; these include the harmonic mean estimator and a weighted variant the group invented (explored by Ford); thermodynamic integration (Gregory); and nested sampling (Bullard and Clyde). Several more innovative methods appear more promising. Several involve modifications of the well-known importance sampling method for Monte Carlo integration. Gregory had success with a region-restricted importance sampler. Ford, Bullard, and Jefferys had some success with Gaussian mixture importance samplers built by subsampling MCMC output; Berger and Clyde helped refine these algorithms. Berger also invented a hybrid ratio estimator (somewhat along the lines of harmonic mean, but more robust); Ford had some success with this method.

254 Two other methods were devised that so far have been tested only with “toy models.” Loredo and Berger devised an adaptive kernel density sampler, borrowing ideas from the multivariate locally adaptive KDE literature. Loredo devised an adaptive cubature method that uses MCMC output points to define simplexes that are used in a multivariate generalization of the trapezoid rule. Most of the group’s work focused on data from systems with a strong exoplanet signal. As the program progressed, there was a growing realization that different methods may be appropriate in different regimes of signal strength; this issue—not appreciated before the SAMSI program—remains to be explored. The group’s marginal likelihood calculations compared zero-planet and one-planet models. Another important open issue is the degree to which the methods scale as the number of planets under consideration increases to two or more, significantly increasing the dimensionality. Members of the group continue to collaborate together on many aspects of this rich family of problems. Their recent work includes both further development of marginal likelihood algorithms, as well as improvements in MCMC algorithms, including use of new, population-based evolutionary MCMC algorithms. The main goal of this ongoing research is to find sound, robust algorithms for observers to use, both in final data analysis, but especially for adaptively planning observations.

2.5 Surveys and Population Studies

The Surveys and Population Studies (SPS) working group planned an intensive working session for March 2006. Their early sessions, beginning immediately after the kickoff workshop, thus focused on identifying a baseline of problems and methods that could be fruitfully explored with later, face-to-face collaboration. Astronomer members made several presentations introducing statisticians to various types of astronomical surveys, and the arcane cornucopia of terminology astronomers use to describe the various biases and distortions survey techniques introduce into the data (via selection criteria and measurement error). Some of the topics covered included size-frequency distributions (“log N – log S” distributions; Loredo presenting), Lutz-Kelker bias in parallax surveys (Haywood Smith), and Malmquist bias in galaxy surveys (Martin Hendry). Statisticians Jang and Babu provided presentations on methods for handling truncation and censoring (i.e., methods from survival analysis). Two prominent, recurring themes arose in these presentations that guided much of the group’s subsequent work: the tension between model-based and design-based analysis methods, and the important role of measurement error in astronomical survey data. A unique aspect of astronomical surveys is the combined presence of truncation (often random) and measurement error (often heteroscedastic). Model-based approaches can readily handle both complications, but astronomers’ techniques rely on overly restrictive models. Design-based approaches handle truncation in a more robust way than the model-based methods, but no such methods known to astronomers can account for measurement error. The March intensive working session brought many group members to SAMSI over a two week period. In addition to regular members, Woodroofe visited SAMSI for the particle physics session, and presented his work on estimating velocity distributions

255 in dwarf galaxies using shape-restricted estimation to the SPS group. This work provides a handle on the amount of dark matter in dwarf galaxies. Presentations and collaborations in the March session addressed finding methods that marry the rigorous error handling of parametric modeling with the robustness of product-limit estimators. We discovered that most existing work on surveys in statistics treats only one of these complications; their simultaneous treatment is a research frontier. Jang, Hendry and Loredo in particular extensively discussed these issues together amidst SPS working group activity. They are currently collaborating on a proposal to be submitted to the NSF astronomy program, to develop new methodology for jointly treating measurement error and random truncation in cosmological surveys, based on ideas from their SAMSI collaboration. One direction they plan to pursue is a combination of Lynden-Bell-Woodroofe nonparametric density estimation with a de-convolving kernel density estimator accounting for measurement error. A second direction is motivated by a discovery by Loredo in the course of preparing SPS presentations. He found that an oft-cited old paper of astronomer Sir Arthur Eddington on measurement error bias includes overlooked results that anticipated some key ideas of shrinkage estimation, a decade or more before their introduction into statistics. Jang, Hendry and Loredo hope to elucidate this implicit connection between shrinkage and measurement error bias in surveys in their proposed research. The March session also considered two additional topics. The first—coincidence assessment— was related to earlier topics by the important role of measurement error. Loredo gave an overview of several recurring astronomical problems that require one to determine whether a newly observed object has a counterpart in a survey; the assessment depends on comparing the measured direction of the object with those of many possible counterparts, all with measurement error—a multiple testing problem. Loredo outlined a Bayesian approach, but with several open issues. Jang and Woodroofe brought insights from frequentist multiple testing research to the problem. An intriguing direction for future research involves combining false discover rate (FDR) control techniques with Bayesian modeling. FDR may be used (with a high rate) to generate a subset of data for a subsequent Bayesian analysis. This both reduces the size of the data set, making the subsequent analysis more computationally tractable, and establishes an objective prior for the subset. The second new topic in the March session was spatial statistics for understanding the galaxy distribution. Pablo de la Cruz worked with Vicent Martinez to provide an extensive overview of the motivating astronomy and current methodology for understanding the structure of the galaxy distribution, both in terms of how the amount of structure depends on scale, and how to measure the topology of the structure. This set the stage for the meetings immediately following the intensive session. In those meetings, Jang and Loh gave detailed presentations on new statistical methods for measuring structure in spatial data. Jang’s presentation covered a technique he developed just prior to the SAMSI program, for accelerating cluster analysis of very large point process data sets by estimating level sets of the underlying density for the process. His starting point is a union-of-balls estimator for the level sets, but Jang has shown how to approximate the estimator with balls centered on regular grid points, rather than on data points. This allows use of FFTs, greatly accelerating level set estimation, allowing application to much larger data sets. His presentation led to a collaboration with Hendry (they had not

256 met prior to the Astrostatistics Program). Hendry helped Jang refine his work for cosmological applications, and the two of them recently submitted a paper on this work for publication. At the first SPS meeting, several astronomer participants raised issues about MCMC and marginal likelihood (ML) algorithms tailored to astronomical survey modeling. As a “bookend,” the final weeks of the program saw investigators returning to this topic, inspired in part by parallel work in the Exoplanets group. Loredo and Chernoff began exploring several new methods for MCMC and marginal likelihood (ML) calculations; their research on them is continuing.

2.6 Source and Feature Detection

Recently launched or soon-to-be launched space-based telescopes that are designed to detect and map ultra-violet, X-ray, and gamma-ray electromagnetic emission are opening a whole new window to study the cosmos. Because the production of high-energy electromagnetic emission requires temperatures of millions of degrees and is an indication of the release of vast quantities of stored energy, these instruments give a completely new perspective on the hot and turbulent regions of the universe. The complexity of the instruments, the complexity of the astronomical sources, and the complexity of the scientific questions leads to a subtle inference problem that requires sophisticated statistical tools. The work of ‘Source and Feature Detection’ group focused on developing tools for detecting and classifying physical structure images and spectra collected with these state-of-the art instruments. Specific accomplishments of this working group include: • Development of a theoretical framework for the detection of very faint astronomical sources. Data typically consists of just a few photons, some of which may originate from background contamination of the data. The framework included terminology and methods for detecting a source and quantifying how bright an undetectable source might be, given the characteristics of the observation. New methods allow us to leverage information obtained from numerous faint sources to learn about the population of sources.

• Object detection in multi-epoch data. With large scale panchromatic synoptic surveys becoming more common, image co-addition is becoming necessary as new observations start to get compared with co-added fiducial sky in real time. The standard co-addition techniques have included straight averages, variance weighted averages, medians etc. A more sophisticated nonlinear response chi-square method is also used when it is known that the data are background noise limited and the point spread function is homogenized in all channels. Babu (statistician), Mahabal, Djorgovski (astronomers), and Williams (computer scientist) collaborated to develop a robust object detection technique capable of detecting faint sources. The analysis at each pixel level, based on Mahalanobis distance, seem to detect those not seen at all epochs that are normally smoothed out in traditional methods.

257

• A new implementation of a multi-scale fully-Bayesian method for imaging low-count astronomical sources. This method allows for the quantification of uncertainty in the image, the incorporation of high quality radio-wave images of a source to enhance details in the X-ray or gamma-ray image, and a new method for identifying unexpected physical structure in the image.

• The design of new highly structured models tailored to the specific instrumentation and scientific questions of several NASA missions, including RHESSI (solar data), Chandra (X-ray), GLAST (gamma-ray), and EGERT (gamma-ray). New code was completed for detecting narrow spectral lines with low photon counts. Implementation of other methods for the various instruments are at differing levels of completeness.

• Methods that automate feature detection in solar images are under continued development. These methods aim to identify, track, and monitor the evolution of such solar features as flare, plumes, and sunspot groups. Results using statistical image processing are very promising and appear to be able to automate what has required up until now tedious manual labor.

• The working group will host a special session at JSM 2007 (a topic contributed session). There will be five talks on statistical issues in high energy astrophysics and solar imaging (by David van Dyk, Thomas Lee, Vinay Kashyap, James Chiang, and Alex Young).

2.7 Gravitational Lensing

The Gravitational Lensing working group established several ongoing collaborations between statisticians, mathematicians, and astronomers. Arlie Petters (mathematics/physics) and Brian Rider (probability) have been working on the statistics of multiple images in microlensing using the Kac-Rice formula from geometric probability theory. In related work, Arlie Petters and Ji Meng Loh (statistics) are continuing their collaboration formed during the SAMSI program and currently studying the distribution and statistics of the saddle and minima images using properties of the spatial point processes. Arlie Petters and Charles Keeton (astronomy) are continuing their work on the probability distributions of image magnification in microlensing. Liliya Williams (astronomy) and Zhenyuan Zhu (statistics) started their collaboration on dark matter inversion methods using gravitational lensing and semi-parametric spatial mixed effects models during the SAMSI program. The gravitationally lensed images are often used to reconstruct the mass distribution in galaxies and clusters of galaxies. The gravitational lensing approach to mass reconstruction is usually preferred to other methods because it does not rely on the uncertain assumptions about the physical state of the mass.

258 Furthermore, the visible matter–stars and gas–need not be a good tracer of the invisible mass–dark matter–whose clustering properties are the main scientific motivation for cluster mass reconstruction. In the literature, both parametric and non-parametric mass modeling have been used to reconstruct the mass distribution, each with its own advantages and disadvantages. They propose to use a semi-parametric mass model for the reconstruction. The mass distribution is modeled as the sum of two parts: a nonlinear parametric mean structure, and the deviation from the mean structure, which is assumed to be a realization of a Gaussian random field (GRF). They use the restricted maximum likelihood method to estimate the parameters of the mean structure and the covariance function of the GRF from the positions of the lens images, and reconstruct the mass distribution non-parametrically using the best linear unbiased predictor (BLUP). Conditional simulation is used to derive multiple realizations of possible mass distribution under the strong lensing constraints. This methodology has been applied to simulated examples with encouraging preliminary results. They plan to further develop the methodology and algorithms, and apply them to real lens data of clusters of galaxies. One direction they would like to pursue is to relax the Gaussian assumption, and use transformed GRF (TGRF) for modeling the mass distribution. Another potential future work is to model the mean structure using the visible light from the clusters, and use the GRF/TGRF as a model to reconstruct the unobserved dark matter distribution, which need not be proportional to that of the visible light.

2.8 Intensive session on Stellar Evolution

The period Feb 20-27 was an intensive session at SAMSI where those in residence (van Dyk, Jefferys) worked with Steve de Gennaro, Elizabeth Jeffery and Nathan Stein on various problems. These included: Improving MCMC sampling, handling field stars, handling binary stars (a major breakthrough here, since the group thought of a way of doing this that avoids reversible jump or other tricks to sample on spaces of variable dimension). Nathan Stein has produced an internal technical report that documents the group’s software and algorithms, but it hasn’t been disseminated publicly yet. Over the past year since the mini-workshop, with the involvement of David van Dyk in the project, the group has made significant advances on the project, improving the MCMC sampling, which had been a major impediment. Whereas, a year ago all of the tests have been on artificial data, the group is now having some success in analyzing real data. From the analysis of real data, it is determined that none of the stellar evolution codes adequately model the lower main sequence, and the group is actively working on ways to improve this situation. They have included the effects of heavy metal abundance in the code and are working on including recognizing field stars and distinguishing single stars from binaries. The statistical model is now more detailed and realistic, but is also vulnerable to problems in the underlying stellar evolutionary models. Currently, no sets of isochrones satisfactorily model the lower main sequence. The group is exploring possible solutions.

259 2.9 Intensive session on Statistical Issues in Particle Physics

The group has grown out of the PHYSTAT series of Conferences on “Statistical problems in Particle Physics, Astrophysics and Cosmology”, and in particular the participation of Babu and Feigelson at the Stanford PHYSTAT meeting. The PHYSTAT05 Organizing Committee also endorsed the need for a series of Workshops focused on specific problems. A very important feature of the Working group was the opportunity for experimental Particle Physicists and Astrophysicists to interact with Statisticians for discussions as well as participate in the more structured talks. This interaction was invaluable for learning new techniques, correcting misconceptions, and for introducing Statisticians to some of the interesting statistical problems in current analyses. The very active intensive session had a series of meetings at SAMSI during the week of March 6th to 10th. The topics focused on include:

• Upper limits in the presence of nuisance parameters. Results and properties of Bayesian and Frequentist approaches to this problem were presented by Heinrich and Punzi respectively. The presentations by statisticians Reid and Woodroofe were very valuable to the physicists in the group.

• Multivariate methods for signal/background separation. Almost every analysis in particle physics involves such a procedure. Prosper discussed very recent results of Bayesian neural networks that showed good behavior with a remarkably small number of training events. He also raised interesting theoretical questions about how to test compatibility between various multi-dimensional distributions, such as those used for training multivariate procedures.

• Goodness of fit with sparse multi-dimensional data; p-values; discovery. With the advent of the new Large Hadron Collider accelerator at CERN in 2007, probably the most crucial question will be assessing the significance of any possible signal for the Higgs boson or for new physics beyond the Standard Model (e.g. super- symmetry, quark and/or lepton substructure, extra dimensions). This is usually assessed in terms of significance p-values for the null hypothesis of the Standard Model. As with upper limits, nuisance parameters cause problems; the possibilities were discussed by Cranmer and Demortier. Further studies are in hand to compare the methods they described, and their properties. It was particularly valuable for Particle Physicists to be exposed to Bayesian methods.

Michael Woodroofe, John Hartigan, Hyunsook Lee and Louis Lyons remained at SAMSI for the rest of March. This enabled an ongoing series of less formal interactions, including with members of other Working Groups, and with members of the Duke Statistics Department. In particular, the question of anomaly detection is common over a wide variety of subjects, ranging from Astrophysics to Medical studies; the inter- disciplinary discussions that are readily possible in the SAMSI environment are particularly valuable. Particle Physicists in the past have tended to develop their own methods for dealing with the statistical analysis of their data. It was especially valuable to

260 have contact with statisticians who have an understanding of practical statistical problems. As well as those in the Particle Physics Working Group, particular new links have been forged with statisticians Tom Banks, Jim Berger, David van Dyk and Robert Wolpert. Discussions with astrophysicists Bill Jefferys and Tom Loredo also were most valuable. The work done at SAMSI served as an important stepping-stone towards a Workshop held at the Banff International Research Station (BIRS) in July 2006 on “Statistical inference Problems in High Energy Physics and Astronomy.” SAMSI participants Lyons, Linnemann, and Reid organized this meeting. The presentations at the SAMSI Intensive session on Statistical Issues in Particle Physics include: Nancy Reid “Modifications to Profile Likelihood”; Nancy Reid “p-value Functions”; Luc Demortier “p-values from A to P ”; Jim Linnemann “False Discovery Rate”; Jim Linnemann “Statistical Software Repository for Particle Physics”; Kyle Cranmer “Discovery in Presence of Nuisance Parameters”; Michael Woodroofe “Nuisance Parameters”; Pushpa Bhat “Multivariate Methods”; Harrison Prosper “Signal/Background Discrimination in Particle Physics”; Giovanni Punzi “Ordering Rules for the Neyman Construction with Nuisance Parameters”; Joel Heinrich “Limits and Nuisance Parameters”; Jim Berger “Bayesian Testing”; John Hartigan “Conditioning”; John Hartigan “Stein’s Paradox”; John Hartigan “Bayesian Priors”; Louis Lyons “p-values in Particle Physics”. The SAMSI meeting resulted in internal notes and improved analyses, more than separate papers on the statistical methods. Thus the benefit of the SAMSI meetings is far greater than indicated by the number of papers. The Large Hadron Collider (LHC) has enormous potential to discover the Higgs boson and physics beyond the standard model. However, the experiments at the LHC are entering a new regime in terms of their data’s volume and complexity. This poses a significant challenge to the statistical analysis of their data, and the SAMSI workshop represented the most promising approaches for the LHC experiments. Since the workshop, Cousins and Tucker have submitted for publication a paper that further demonstrates that the most popular of our previous methods fails to perform adequately for many of the problems of the LHC. They point to several of the methods discussed at SAMSI as more appropriate solutions. Kyle Cranmer presented to the ATLAS collaboration several of the methods discussed at SAMSI for incorporation of systematic errors in new particle searches, and was subsequently appointed one of the ‘experts’ of the ATLAS statistics committee. Jim Linnemann presented the False Discovery Rate technique frequently used in Astrophysics, and it is now being considered as an integral part of one of the strategies in the search for supersymmetry. In Particle Physics it often turns out that searches for hypothesized phenomena often do not find any evidence for the new effect. A historic example of this is the Michelson-Morley attempt to measure the speed of the Earth with respect to the aether. More contemporary examples are the searches for the Higgs boson, Supersymmetric particles, direct observation of dark matter, etc. However, if stringent upper limits can be set on the unobserved phenomena, the null result can perhaps be used to rule out various theories. The topic of how best to set upper limits is thus important, but there is at present no consensus on how this should best be done.

261 At the July 2006 BIRS workshop, after listing the main methods that have been proposed to set upper limits on cross sections in the presence of nuisance parameters, an attempt was made to collectively construct a matrix that listed their properties. This resulted in considerable discussion, after which it became clear that the matrix was only reasonably complete in the column marked coverage, and that only for a single channel. As all the methods had reasonable coverage properties, more information was needed for a prospective user to decide which to use. A collaborative project to supply the necessary information about each method, the Limits Challenge, was therefore initiated. Proponents of each method agreed to supply their resulting intervals for a common set of test cases. This will permit direct comparison of frequentist coverage, interval lengths, and Bayesian credibility. It was decided that 1 channel and 10 channel cases would be investigated. The results will be presented at the 2007 PHYSTAT-LHC Workshop. The work started at SAMSI continues and a Workshop is being organized at CERN on ”Statistical issues for LHC Physics”. The Large Hadron Collider (LHC) is the very high energy colliding protons machine that is due to start operating later this year. It will hopefully make major discoveries of new elementary particles, and so the CERN Workshop will be devoted primarily to issues of quantifying the significance of observed effects, including the influence of nuisance parameters. The planning for the Workshop benefited greatly from the experience of previous Workshops in the PHYSTAT series and from discussions at SAMSI and Banff. More information will be available from the web- site: http://phystatlhc.web.cern.ch/phystat-lhc/index.html

2.10 Graduate Student Involvement

Five graduate students from US institutions and two from abroad actively participated in the Astrostatistics program at SAMSI. In addition, one student participated in the Intensive session on statistical issues in Particle Physics. Floyd Bullard (Duke) is the SAMSI Graduate Fellow associated with the Exoplanets working group. He was an active member of the group, attending each meeting, maintaining the group’s web page, and coding MCMC, importance sampling and other algorithms for model fitting and model selection. His graduate work is focused on activities of the working group. He has given a presentation in a student seminar series at Duke on the search for exoplanets, and at SAMSI as part of the graduate student and post-doc seminar series. He was one of several graduate students involved in the SAMSI Astrostatistics Program in the spring of 2006. He maintained the web page for the Exoplanets Working Group (http://www.samsi.info/200506/astro/workinggroup/exo/) and kept minutes of the weekly meetings. At two or three working group meetings he gave brief presentations of the results of some of his work such as trying to solve a model selection problem using a new technique (integrating over a parameter space using nested sampling). Following up on the SAMSI workshop, he was a research assistant for Merlise Clyde (ISDS, Duke University) during the Fall of 2006, during which time they explored the problem of integrating over a highly multimodal space using nested sampling. He has now begun working on his Ph.D. thesis, that grew out of his participation in the SAMSI program. His thesis topic is “Improving the Efficiency of Scheduling Radial Velocity Measurements for Exoplanet Detection Using Bayes and a Fast Integral Estimator”.

262 Matthew Fleenor was a final year graduate student in the Physics Department, UNC, during the Astrostatistics Program. His thesis research (under Prof. James Rose, an astronomer) concerned studying dynamical and kinematical properties of galaxy clusters via spectroscopic observations of the constituent galaxies. Matt frequently attended SPS working group meetings to learn about open issues and current research on survey analysis methods. He made a special effort to visit SAMSI during the SPS intensive session, e.g., consulting with Martin Hendry. His thesis work was largely completed by the time of the SAMSI program, so the program did not directly impact his thesis work. Matt is now on the faculty in the Physics Department at Roanoke College in Virginia. Hyunsook Lee (Penn State) is a statistics graduate student with an undergraduate background in astronomy. She attended tutorials and the astrostatistics kickoff workshop. During that time, she presented a poster, titled “Convex Hull Peeling: Nonparametric Multivariate Data Analysis.” Some other related results were be presented at Interface 2006 (Detecting Outliers in Multivariate Massive Data by Convex Hull Peeling with Applications), SCMA IV (Nonparametric Approach to Multivariate Massive Data Analysis by Convex Hull Peeling), and JSM2006 (A Nonparametric Approach to Descriptive Measures of Multivariate Massive Data Based on Convex Hull Peeling Depth). After the workshop, she joined various focused working group meetings: exoplanets, source and feature detection, gravitational lensing, particle physics, and survey and population studies. She maintained the websites for the Survey and Population Studies working group, and for the Particle Physics group. She was very helpful in providing Survey and Population Studies working group astronomers with information about the strengths and weaknesses of information criteria for model selection (e.g., AIC vs. BIC), and with information about computational geometry tools. She finished her dissertation and graduated from Penn State in 2006. She was an invaluable assistant for the closing workshop SCMA IV. Feedbacks from her poster presentation at the kick off workshop were reflected in her dissertation and other later presentations. She is in the process of writing papers on model selection with a jackknife method and nonparametric massive data analysis with convex hull peeling. The first topic is of theoretical nature and the latter one focuses on developing algorithms for exploratory data analysis with some supporting theory. Finally, participating in the program as a graduate student led her to find a Postdoc position in Harvard-Smithsonian Center for Astrophysics as the only statistician among 900 researchers. Nicholas Robbins (Duke) maintains the public web-page for the Gravitational Lensing working group. He is in the early stages of his thesis work with Professor Bray. Topics covered in the lensing session may be integrated in his thesis, but it is too early in the semester to say definitively. Lingsong Zhang (UNC) is interested in multivariate outlier detection and functional data analysis using singular value decomposition. He was in charge of maintaining the website for the Source and Feature Detection working group, and he was also an active participant of the discussion. Lingsong had developed visualization tools for functional data, and is currently working on multi-resolution outlier detection methods for detecting outliers in long-range dependent time series, with applications in Internet anomaly detection. He was in the astrostatistics program to look for interesting astronomy applications for which he can apply his visualization tools and outlier

263 detection methods. He is also interested in developing new methodology for challenging astronomy problems. Pablo de la Cruz is working on his Ph.D. thesis under the joint supervision of Vicent Martinez (Astronomy) and Jose Miguel Bernardo (Statistics) at the University of Valencia. Pablo resided at SAMSI throughout most of the astrostatistics program, participating predominantly in the Surveys and Population Studies (SPS) group, but also in the Exoplanets group. Pablo was the youngest student participating in these groups; he was a second-year student at the time. He participated in nearly every Exoplanets and SPS working group meeting. He also interacted extensively with researchers when they visited SAMSI, often scheduling one-on-one meetings to learn about their work and methods. He prepared an extensive presentation on “Statistics for the Large Scale Structure”, providing a survey of work on quantifying 2D and 3D structure in the galaxy distribution, and reporting on work in progress with Martinez. Pablo cites his extensive personal interaction with researchers as the most important and rewarding aspect of his SAMSI participation. His peer students in statistics at Valencia for the most part get assigned research problems by their advisors after their second year. Pablo instead is exploring several possibilities together with Martinez; he credits his SAMSI visit with exposing him to a much wider variety of problems and methods than he would have otherwise known about, allowing him to play a much more active role in developing his thesis program. Also, Pablo spent considerable time at SAMSI exploring statistical computing environments, taking advantage of researchers’ varied experiences in many environments to learn about their strengths and weaknesses. He did calculations in R, C, Mathematica, and Python at SAMSI (he has settled on a combination of R and C for his thesis). He also learned MCMC algorithms and especially the importance of output diagnostics. As a measure of the success of the program, he notes that the closing workshop (SCMA 2006) was the first scientific meeting he has attended where he felt he really understood the majority of the topics being discussed, and felt involved with the research. Brendon Brewer is a student in the Physics Department at the University of Sydney in Australia. His thesis research (under Prof. Geraint Lewis, an astronomer) uses Bayesian methods to address inverse problems in astronomy associated with gravitational lens and asteroseismology data. He was originally invited to participate in the gravitational lens group, but correspondence with Petters indicated that the topics the group was focusing on would not directly address his research interests. However, he was very interested in learning about Bayesian and other methods employed in the SPS and Exoplanets groups. Due to his location in Australia, remote participation was not feasible, so Brendon’s participation was limited to two weeks, when he attended the SPS and Exoplanets intensive research sessions. Brendon was particularly interested in computational techniques for model selection, a topic that arose both in the SPS and Exoplanets groups. Inspired by talks he heard at SAMSI, on his return to Sydney, he pursued research on marginal likelihood methods, changing the approach he had previously taken for his work (he is presently using annealed importance sampling; related methods were pursued at SAMSI, especially by Phil Gregory). Brendon met Martin Hendry via the SPS group, and Martin invited him to the University of Glasgow to give a seminar on his thesis work. Brendon has also become interested in survey issues, particular Malmquist bias (which may play a role in analysis of gravitational lens

264 systems). He discussed approaches to handling Malmquist bias with Loredo, Hendry and Chernoff, and hopes to pursue research on this topic after his thesis is completed. Bodhisattva Sen is a graduate student in the statistics department at the University of Michigan, Ann Arbor. He is working with Michael Woodroofe and Moulinath Banerjee on his dissertation. A portion of his thesis will be on applications of Statistics in High Energy Physics (more specifically, on construction of confidence intervals in presence of nuisance parameters in examples that arise frequently in HEP). Bodhi attended both the opening workshop on Astrostatistics (in January 2006) and the intensive session on statistical issues in Particle Physics (in March 2006). Michael Woodroofe presented a joint work with Bodhi Sen ”On the Unified Method with Nuisance Parameters” in the session on Particle Physics, which has now been submitted for publication in a Statistics journal.

3. Workshops

3.1 Planning meeting

In order to begin focusing on the research topics for the Astrostatistics Program, a planning meeting was held at NASA Ames Center during 14-15, 2005. Thursday, July 14, was devoted primarily to scientific discussion, including learning about the wide variety of research interests of astronomers, physicists and statisticians. Each participant had roughly 30 minutes in which to describe his/her interests or applications, although a significant portion of this time was reserved for questions and discussion. Friday, July 15 was devoted mostly to discussion of the SAMSI program itself, especially discussion of potential participants and the planning of workshops and events for the semester long program. The participants included: Jogesh Babu (Program Leader), James Berger (Director of SAMSI), Peter Bickel (NAC Co-Chair), Floyd Bullard, Merlise Clyde, Alanna Connors, Andrew Connolly, Phil Gregory, Fabrizia Guglielmetti, Bill Jefferys, Tom Loredo, Louis Lyons, Fionn Murtagh, Don Richards, Jeff Scargle, Megan Sosey, David van Dyk, Larry Wasserman.

3.2 Opening workshop

The January 23-25, 2006 opening workshop for the program attracted 67 attendees from diverse fields including, statistics, astronomy, physics and applied mathematics, and met the goal of informing the composition and activities of the Working Groups. Details of the program are at http://www.samsi.info/workshops/2005astro-workshop200601.shtml. All the presentations at the opening workshop are available at the CASt web site http: //astrostatistics.psu.edu/samsi06/index.html\#workshop.

3.3 Education and Outreach

The Astrostatistics Program began with Tutorials from 1/18/2006-1/22/2006, designed to familiarize statisticians with current trends in astronomy and expose astronomers to modern methodologies in statistics and applied mathematics. These were conducted in

265 collaboration with CASt to prepare astronomers and statisticians for the cross- disciplinary presentations at the opening workshop. The three tutorials were:

• Bayesian Astrostatistics (led by Tom Loredo, Cornell University). This three-day session included several lectures and practicum classes, by Tom Loredo, Bill Jefferys (Universities of Texas and Vermont) and Philip Gregory (University of British Columbia), teaching 31 participants the basic theory and practice of Bayesian statistics, using examples from astronomy.

• Nonparametric statistics and Machine Learning for astronomers (Chad Schafer and Larry Wassermann, Carnegie-Mellon University). This two-day tutorial introduced astronomers to modern methods in nonparametric statistics including: kernel regression, local polynomial regression, splines, wavelets, adaptive methods, and density estimation. The tutorial included implementation details in the R language. 24 attendees participated in this. • Astronomy for statisticians (Bill Jefferys of Universities of Texas and Vermont, and Eric Feigelson of Penn State). In this two-day tutorial modern understanding of our universe was reviewed spanning planetary systems, stars, the Milky Way Galaxy, extragalactic astronomy and cosmology. Statistical issues underlying the astronomical studies were emphasized and discussed. 29 attendees participated in this. All the presentations at the tutorials are available on-line at the Center for Astrostatistics web site http://astrostatistics.psu.edu/samsi06/index.html\#Tutorials. In addition to the tutorials, a Seminar Course on Astrostatistics was held during the Program, led by Jogesh Babu.

3.4 Closing Workshop

Statistical Challenges in Modern Astronomy IV (SCMA IV): The fourth in a series of interdisciplinary international research conferences, organized by Babu and Feigelson, served as a closing workshop for SAMSI Astrostatistics program. It was held at Penn State University on June 12-15 2006. The scientific program was divided into seven topical sessions with Invited Speakers in astronomy or statistics accompanied by Commentators from the other discipline. The sessions included: Cosmology; Small-N problems; Astronomical surveys; Periodic variability; Recent developments in statistics; Planetary systems; and concluded with Cross-disciplinary perspectives on Physics by Louis Lyons (Oxford); Statistics by James Berger (Duke); and Astronomy by Ofer Lahav (UC London). 104 researchers participated in the conference out of which 17 were women. The participants included 18 students, all except one are from US institutions. Thirty two participants arrived from 16 foreign countries: United Kingdom, France, Switzerland, Australia, India, Denmark, Italy, Spain, Canada, Japan, Israel, South Africa, Hungary, New Zealand, Netherlands, and Colombia. Graduate students Hyun-Sook Lee and Derek Young were very helpful assistants during the conference. The proceedings of the conference are being edited by Babu and Feigelson. The proceedings will be published by the Astronomical Society of the Pacific.

266 3.5 BIRS Workshop

The program at SAMSI in March 2006 continued at Banff with a BIRS Worshop on “Statistical Inference Problems in Particle Physics and Astrophysics” in July 2006. This concentrated on three topics: Methods for setting upper limits; assessing statistical significance for new phenomena in the presence of nuisance parameters; and multivariate techniques for separating signal and background. A continuing activity from this meeting is the Banff Challenge. This consisted of participants using a whole variety of techniques for determining upper limits on data provided by Joel Heinrich. He will produce a comparison of the performance of these methods at the PHYSTAT-LHC Workshop in June 2007 (e.g. coverage; interval length; Bayesian credibility; pathologies; etc). Two of the methods are being presented at the Joint Statistics Meeting in August. More information on the BIRS Workshop, including the final report, is available at http: //www.pims.math.ca/birs/birspages.php?task=displayevent&event_id=06w5054

4. External Support

SCMA IV conference is supported in part by NSF grant to the Center for Astrostatistics, NASA grant to G. J. Babu. The organizers also appreciate financial support from Penn State’s Outreach division, and particularly the skilled work of the conference planner John Farris. Loredo, Chernoff, Clyde and Berger are supported by an NSF grant that partially supported their work and Floyd Bullard’s work during the program. This grant also is supporting Bullard’s thesis work, which continues some of the work of the Exoplanets Working Group. A NASA grant with Ted von Hippel (UT, Austin) as PI, entitled, “The Ages and Cooling Physics of White Dwarf Stars from Archival and New HST Observations”, helped support the followup work of the Stellar Evolution group. The funding for the BIRS/SAMSI workshop was mainly provided by BIRS. Kyle Cranmer received funding for the workshop from the Brookhaven Science Associates, which manage Brookhaven National Laboratory.

5. Industrial and Governmental Participation

Because of the nature of the program, there was not industrial involvement. However, there was significant participation in the working groups from government agencies and laboratories such as NASA-Ames, NASA-Goddard, Smithsonian Astrophysical Observatory, Brookhaven National Laboratory and Fermi National Laboratory.

6. Affiliates Participation

There were working group participants from each of the following university affiliates: University of California-Berkeley, Carnegie Mellon University, Duke University,

267 University of Georgia, University of Michigan, University of North Carolina at Chapel Hill, Pennsylvania State University, and Purdue University.

7. Research Highlights

The past year has witnessed some impressive advances in applications of statistical methods to cosmology.

7.1 Other Earths?

Is our solar system special? In particular, are there other Earths in our Galaxy—rocky planets in the habitable regions around sun-like stars? So far over 200 planetary systems in our region of the Milky Way are detected. The vast majority of extrasolar planets (exoplanets) are too small and dim to be seen directly. Exoplanets are infered indirectly by detecting the reflex motion of their host star—the minute “wobble” of its position on the sky due to the changing gravitational tug of a planet as it swings round its orbit. Astronomers and statisticians in the exoplanets working group worked together on developing new statistical methods to extract the complex signals from the observations. The data contain significant noise and are sparse and unevenly spaced in time. This often produces significant uncertainty in the properties of candidate planets, thwarting simple analysis methods. The exoplanets working group adopted Bayesian methods to carefully quantify and express uncertainties in planet properties (e.g., planet mass, and orbit size and ellipticity). The group also worked on development of adaptive methods for scheduling ongoing observations of an exoplanet system, to optimize detection of a planet, or estimation of a detected planet’s properties. The approach uses current, incomplete data from a system to predict its future behavior; Bayesian experimental design uses those predictions to identify the best future observation times. The data sparseness and nonuniform sampling combine with highly nonlinear models to make the calculations challenging even for the most modern methods. The group thus created significant new methodology for Bayesian calculation with modest-dimension nonlinear models, including adaptive and population-based MCMC algorithms, and marginal likelihood estimators based on innovative combination of MCMC output with ideas from importance sampling and locally adaptive multivariate kernel density estimation.

7.2 Quantifying Broad Patterns Across the Sky

Late last century, astronomers found evidence for a “Gamma-Ray Halo” by comparing CGRO/EGRET gamma-ray images of the whole sky to the best available physical models. In the gamma-ray sky, the most prominent sources are not “point-sources” such as pulsars and active black-holes; but broad, irregular swaths of diffuse emission. This gamma-ray signature essentially maps out how highly energetic particles such as cosmic- rays impinge on and illuminate both irregular gas clouds and the lower-energy ambient “photon field”. Good understanding of these, can help in predicting the Galactic diffuse gamma-ray emission. This would probably help in understanding our Galactic cosmic-ray and diffuse gas environment. The challenge is in quantifying local or micro uncertainties in the images.

268 To tackle this challenge, Source and Feature Detection working group, used highly-structured multi-level models (which probabilistically follow the path of photons through one’s telescope), plus Bayesian statistical methods to construct images from the often limited photon-count data. These models include multi-scale mathematical components that encourage structure in the images at different levels of resolution, enabling the study of both macro and micro structures in the astronomical source. The model encourages local smoothness in the constructed images, but unlike many methods, the Bayesian procedures allow the degree of smoothing to be largely determined by the data. The Bayesian framework also allows to combine information from multiple sources. The group developed sophisticated new computational tools tailored to these problems. Although computationally expensive, these tools leverage the highly-structured model to deliver not only the best guess of an astronomical image but also a quantification of the uncertainty in the best guess. The group is developing new highly structured models tailored to the specific instrumentation and scientific questions of several NASA missions, including RHESSI (solar data), Chandra (X-ray), GLAST (gamma-ray), and EGERT (gamma-ray).

7.3 Search for New Phenomena in Particle Physics

The issue of hypothesis testing from a Bayesian perspective was the subject of a lively discussion, guided in part by an informal presentation by Jim Berger at SAMSI during March 2006. Physicist Harrison Prosper realized that the concepts under discussion— Bayes factors, which requires the use of proper priors—could be used in the ongoing search at Fermilab for evidence of the production of single top quarks. The key point was the realization that two well-defined hypotheses were under consideration: the Standard Model with and without single top reactions. Therefore, it was possible to compute a valid Bayes factor without ambiguity and with well-defined priors. Moreover, much of the Bayesian machinery required for the calculation of Bayes factors had already been put in place by the D∅ Single Top Group. On December 8, 2006, D∅ announced it had, for the first time, evidence that such reactions indeed exist. This was probably the first time that an important physics result used a Bayes factor (or rather an approximation to it, called a Bayes ratio) in the optimization of the associated analyses.

8. Summary

The Astrostatistics program has provided a unique opportunity for extensive interaction and collaboration between astronomers and statisticians on a complex and important set of astronomical data analysis problems. There have been very few similar such opportunities in contemporary astronomy. The tutorials and initial contacts together taught statisticians exciting, cutting-edge astronomy, and taught astronomers the latest in statistics including cutting-edge nonparametric and machine learning algorithms, and Bayesian computational technology. The methodological needs have pushed participants in both disciplines into new territory, producing new methods that will prove useful elsewhere in statistics, and specific implementations that are already solving useful astronomical problems (improved exoplanet detection and estimation, X-ray and gamma-

269 ray source detection), with great promise for the future. Collaborations formed during the astrostatistics program continue to flourish.

9. Publications and Technical Reports

• Babu, G. J., Mahabal, A., Williams, R., and Djorgovski, S. G. (2007). “Object detection in multi-epoch data”. To appear in the proceedings of Astronomical Data Analysis IV.

• Clyde, M. A., Berger, J. O., Bullard, F., Ford, E. B., Jefferys, W. H., Luo, R., Paulo, R., and Loredo, T. (2007). “Current Challenges in Bayesian Model Choice”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Connors, Alanna and van Dyk, David A. (2007). “How to Win with Non- Gaussian Data: Poisson Goodness-of-Fit”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Ford, E. B., and Gregory, P. C. (2007). “Bayesian Model Selection and Extrasolar Planet Detection”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Ford, E. B., and Rasio, F.A. (2007). “Origins of Eccentric Extrasolar Planets: Testing the Planet-Planet Scattering Model”. Submitted to ApJ.

• Gregory, P. C. (2007). “A Bayesian Kepler periodogram detects a second planet in HD208487”. Monthly Notices of the Royal Astronomical Society, Volume 374, Issue 4, pp. 1321-1333. (MNRAS Homepage). Publication Date: 02/2007.

• Heinrich, J., and Lyons, L. (2007). “Systematic Errors”. Annual Reviews of Particle and Nuclear Physics, to appear.

• Jang, W., Hendry, M. (2007). “Cluster Analysis of Massive Datasets in Astronomy”. Submitted to Statistics and Computing.

• Jeffery, E. J., von Hippel, T., Jefferys, W. H., Winget, D. E., Stein, N., & DeGennaro, S. (2007). “New Techniques to Determine Ages of Open Clusters Using White Dwarfs”. ApJ, Volume 658, 391.

• Jefferys, W. H. (2007). “Current Challenges in Bayesian Model Choice: Comments”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

270 • Lee, H. (2006). “Two Topics: A Jackknife Maximum Likelihood Approach to Statistical Model Selection and a Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications”. Ph.D. Thesis, Penn State University.

• Loredo, T. J. (2007). “Analyzing Data From Astronomical Surveys: Issues and Directions”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Lyons, L. (2007). “A particle physicist’s perspective on Astrophysics”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• Maness, H. L.; Marcy, G. W.; Ford, E. B.; Hauschildt, P. H.; Shreve, A. T.; Basri,

• G. B.; Butler, R. P.; Vogt, S. S. (2007). “The M Dwarf GJ 436 and its Neptune- Mass Planet”. Publications of the Astronomical Society of the Pacific, Volume 119, 90-101.

• Park, Taeyoung, van Dyk, David A., and Siemiginowska, Aneta (2007). “A. Fitting Narrow Emission Lines in X-ray Spectra: Computation and Methods”. Under revision for the Astrophysical Journal.

• Roe, B. (2007). “Nuclear Instruments and Methods”. A570 p 159.

• Sen, B., Walker, M., and Woodroofe, M. “On the unified method with nuisance parameters” Michigan preprint, submitted for publication.

• van Dyk, David A., Park, Taeyoung, and Siemiginowska, Aneta (2007). “Fitting Narrow Spectral Lines in High-Energy Astrophysics Using Incompatible Gibbs Samplers”. In ‘Statistical Challenges in Modern Astronomy IV’ (Eds: G. J. Babu and E. D. Feigelson), San Francisco, Astron. Soc. Pacific, to appear.

• von Hippel, T., Jefferys, W. H., Scott, J., Stein, N., Winget, D. E., DeGennaro, S., Dam, A., & Jeffery, E. (2006), “Inverting Color-Magnitude Diagrams to Access Precise Star Cluster Parameters: A Bayesian Approach”. ApJ, 645, 1436.

9.1 Papers in Progress

• “Upper Limits, Detection Limits, and Confidence Intervals” (David van Dyk, Vinay Kashyap, Aneta Siemiginowska, and Andreas Zezas).

• “Detection and Classification of Sunspots Groups Captured in Magentograms” (Thomas Lee, Alex Young, Vinay Kashyap, and David van Dyk).

271 • “Statistical Modeling of Sunspot Cycles” (Yaming Yu, Vinay Kashyap, and David van Dyk).

• “Bayesian methods for Exoplanet Radial Velocity Data: The Kepler Periodogram and Evolutionary Markov Chain Monte Carlo” (Loredo, Chernoff, Clyde, Berger and Bullard).

• “Reconstruction of the galaxy cluster mass distribution using gravitational lensing and semi-parametric spatial mixed effects model” (L. Willams, and Z. Zhu).

• “A note on measures of significance in HEP and Astrophysics: some higher order approximations” (Zi Jin, James Linneman and Nancy Reid).

• “Likelihood inference for a problem in particle physics” (A. C. Davison and N. Sartori).

• “A Dempster-Shafer Bayesian solution to the Banff A1 Challenge” To be presented at a meeting in August 2007 (P. Edlefsen).

• “Upper limits for source detection in the three-Poisson model” To be presented at a meeting in August 2007 (P. Baines).

• “P-values: what they are and how to use them” CDF note 8662 (L. Demortier)

272 APPENDIX C – Workshop Participants Lists

For most of the SAMSI workshops, the participants will be summarized in three tables below. The first table is a summary of all participants by gender, status, field of work/study, affiliation, and location. The second table lists only the participants who received support. The third table lists all workshop participants. The minority status of each participant is available, but we do not include the information here because of privacy issues; the summaries in Section H: Diversity Efforts were compiled from this data.

The key top Status entry is as follows:

NRG – New Researcher or Graduate Student S – Students (Education & Outreach) FP – Faculty or Professional A – Faculty (Education & Outreach)

2005-06 PROGRAM EVENTS AFTER MAY 1, 2006

™ Astrostatistics Program

Astrostatistics Transition at SCMA VI (at Penn State) Workshop Participants June 12-15, 2006

New Number of Number of Unspec- Faculty/ Researcher/ Home Home Participants Male Female ified Professional Student Stat Math Other Institution State

Supported 12 0 0 9 3 7 0 5 12 7 Unsuppted 74 17 0 49 42 19 3 63 62 19 SAMSI 1 0 0 1 0 1 0 0 NA NA

Astrostatistics Transition at SCMA VI (at Penn State) Workshop Participants June 12-15, 2006

Last Name First Name Gender Affiliation Department Status

U of Central Adamakis Sotiris M Physics NRG Lancashire Hubert Allen and Allen Hubert M FP Associates Physics And Andrew Ptak M Johns Hopkins U FP Astronomy Observatoire de C N R S/g E P I Arenou Frederic M FP Paris-Meudon Bactiment 11

273 Goddard Space Arnaud Keith M NASA FP Flight Center Paul Scherrer Arzner Kaspar M Astronomy FP Institute

Axelrod Timothy M LSST Corporation FP

Pennsylvania State Babu G Jogesh M Statistics FP U U Western Mathematics and Baddeley Adrian M FP Australia CSIRO Statistics

Banerjee Moulinath M U of MIchigan Statistics FP

Astronomy and Bazarghan Mahdi M IUCAA India FP Astrophysics Institut for Fysik Bazot Michael M U of Aarhus NRG Og Astrophysical Belbruno Edward M Princeton U FP Sciences

Berger James M Duke U and SAMSI Statistics FP

Physics and Bernstein Gary M U of Pennsylvania FP Astronomy

Bodhisattva Sen M U of Michigan Statistics NRG

Pennsylvania State Broos Patrick M Astronomy NRG U

Cantrell Andrew M Yale U Astronomy NRG

Cardamone Carolin F Yale U Astronomy NRG

Cayon Laura F Purdue U Physics NRG

Chatterjee Sourav M Northwestern U Physics NRG

Institute for Chen Gang M U of Hawaii NRG Astronomy Goddard Space Chu I Wen Mike M NASA NRG Flight Center Osservatorio Cignoni Michele F Astronomico di Astronomy NRG Capodimonte

Clyde Merlise F Duke U Statistics FP

274 Collet Christophe M Strasbourgh U LSIIT-CNRS FP

Physics and Congdon Arthur M Rutgers U NRG Astronomy Eureka Scientific Connors Alanna F FP Inc. Royal Holloway U Cowan Glen M Physics FP of London Observatori De La Cruz Isnt. Investig. Pol Pablo M Astronomic NRG Martinez La Coma S N Universitat de

Devor Jonathan M Harvard U Astrophysics NRG

Pennsylvania State Astronomy and Feigelson Eric M FP U Astrophysics L A T T Laboratoire Ferramacho Luis M d'Astrophysique de Astrophysics FP Toulouse-Tarbes U of California- Ford Eric M Astronomy NRG Berkeley

Freeman Peter M Carnegie Mellon U Astrophysics NRG

Ghosh Jayanta M Purdue U Statistics FP

U of British Gregory Philip M Physics FP Columbia Space Telescope Hanisch Robert M Astronomy FP Science Institute

Hao Jiangang M U of Michigan Physics NRG

Physics and Hendry Martin M U of Glasgow FP Astronomy

High Fredrick M Harvard U Physics NRG

Hikage Chiaki M Nagoya U Physics NRG

Goddard Space Hinshaw Gary M NASA FP Flight Center Bell Labs Lucent Ho Tin Kam F Computer Sciences FP Technologies

Hugeback Angela F U of Chicago Statistics NRG

Pennsylvania State Hunter David M Statistics FP U

275 Inselberg Alfred M Tel Aviv U Mathematics FP

LSIITMIVPASEO ENSPS Parc D Jalobeanu Andre M NRG Group Innovation

Jang Woncheol M U of Georgia Biostatistics NRG

Jefferys Thomas M Space Technology Astronomy NRG

Jefferys William M U of Vermont Statistics FP

Jennings Kristofer M Purdue U Statistics NRG

Florida Institute of Johnston Kyle M Physics Space NRG Technology

Kochanek Christopher M Ohio State U Astronomy NRG

Koen Chris M U of Western Cape Statistics FP

Physics and Lahav Ofer M U College London FP Astronomy Computer Sci Astronomy and Laidler Victoria F Corp/Space Telescope NRG Sci Institute Computer Sciences Pennsylvania State Lee Hyunsook F Statistics NRG U

Loh Ji Meng M Columbia U Statistics FP

Loredo Thomas M Cornell U Astronomy FP

Lupton Robert M Princeton U Astrophysics FP

Lyons Louis M Oxford U Particle Physics FP

Astronomical Martinez Vicent M U of Valencia FP Observatory

Morris Robin M RIACS Engineering FP

Morrison Nancy F U of Toledo Astronomy FP

U of Central Morton-jones Anthony M Mathematics FP Lancashire U of Texas- Mukherjee Soma F Physics FP Brownsville

276 Engineering and Nasraoui Olfa F U of Louisville NRG Computer

Nord Brian M U of Michigan Physics NRG

Australian National Perkins Kala F Cosmology FP U Goddard Space Phillips Nicholas M NASA FP Flight Center

Prosper Harrison M Florida State U Physics FP

Purger Norbert M Eotvos U Physics NRG

Pennsylvania State Ramsey Larry M Astronomy FP U Pennsylvania State Rao Calyampudi M Statistics FP U U of California- Rice John M Statistics FP Berkeley

Roever Christian M U of Auckland Statistics NRG

Kings College Rogers Benjamin M Physics NRG London Physics And Romanishin William M U of Oklahoma NRG Astronomy Pennsylvania State Rosenberger Jim M Statistics FP U Pennsylvania State Rubbo Louis M C G W P NRG U

Sarro Luis M U N E D Astrophysics FP

Ames Research Scargle Jeffrey M NASA Center, Space FP Science Divison

Schafer Chad M Carnegie Mellon U Statistics NRG

Smit Daniel M U of Leiden NRG

Case Western Sun Jiayang F Statistics FP Reserve U Institute for Szapudi Istvan M U of Hawaii FP Astronomy Dearborn Takeda Genya M Northwestern U NRG Observatory

277 Observatorio Uribe Antonio M Astronomico Astronomy FP Nacional U of California- Van Dyk David M Statistics FP Irvine U of the Statistics and Varughese Melvin M FP Witwatersrand Actuarial Sciences Chip Computers Vio Roberto M FP Consulting U of British Physics And Wall Jasper M FP Columbia Astronomy Electrical and Willett Rebecca F Duke U Computer NRG Engineering Physics And Woan Graham M U of Glasgow FP Astronomy

Woodroofe Michael M U of Michigan Statistics FP

McConnell Imaging Yen Y F F FP Lab Physics And Yip Ching-wa F John Hopkins U NRG Astronomy Goddard Space Young C Alex M NASA NRG Flight Center Pennsylvania State Young Derek M Statistics FP U U of California- Yu Yaming M Statistics NRG Irvine

Zhao Ou M U of Michigan Statistics NRG

U of North Statistics and Zhu Zhengyuan M Carolina-Chapel Operations NRG Hill Research

Zucker Shay M Tel Aviv U Astronomy NRG

278 Statistical Inference Problems in High Energy Physics and Astronomy Participant Summary July 15-20, 2006

New Number of Number of Unspec- Faculty/ Researcher/ Home Home Participants Male Female ified Professional Student Stat Math Other Institution State

Supported 0 0 0 0 0 0 0 0 0 0 Unsuppted 28 5 0 30 3 12 2 19 26 7 SAMSI 0 0 0 0 0 0 0 0 NA NA

Statistical Inference Problems in High Energy Physics and Astronomy Workshop Participants (held at BIRS in Banff, Canada) July 15-20, 2006

Last Name First Name Gender Affiliation Department Status

Lawrence Livermore Bailey Stephen M Physics FP National Lab

Barlow Roger M Manchester U Physics FP

Blobel Volker M DESY Lab Physics FP

Bueno James M U of British Columbia Physics NRG

Burnett Toby M U of Washington Physics FP

Royal Institute of Conrad Jan M Physics FP Technology Brookhaven National Cranmer Kyle M Physics FP Lab Ecole Polytechnique Davison Anthony M Federale de Statistics FP Lausanne

Demortier Luc M Rockefeller U Physics FP

Fraser Don M U of Toronto Statistics FP

Heinrich Joel M U of Pennsylvania Physics FP

Jin Zi M U of Toronto Statistics NRG

U of Illinois-Urbana- Junk Tom M Physics FP Champaign

279 LePage Raoul M Michigan State U Statistics FP

Linnemann James M Michigan State U Physics FP

Lockhart Richard M Simon Fraser U Statistics FP

Lyons Louis M U of Oxford Physics FP

Marchand Eric M U of Sherbrooke Mathematics FP

U of California- Meinshausen Nicolai M Statistics FP Berkeley

Meng Xiao-Li M Harvard U Statistics FP

California Institute of Narsky Ilya F Physics FP Technology

Neal Radford M U of Toronto Statistics FP

Punzi Giovanni M U of Pisa Physics FP

Reid Nancy F U of Toronto Statistics FP

Roe Byron M U of Michigan Physics FP

Rolke Wolfgang M U of Puerto Rico Mathematics FP

Sartori Nicola F U of Venice Statistics FP

Schwienhorst Reinhard M Michigan State U Physics FP

Sen Bodhisattva M U of Michigan Statistics NRG

Harvard-Smithsonian Siemiginowska Aneta F Center for Astrostatistics FP Astrophysics

Vachon Brigitte F McGill U Physics FP

Van Dyk David M U of California-Irvine Statistics FP

Zech Gunter M U of Siegen Physics FP

280 ™ Education and Outreach Program

SAMSI/CRSC Interdisciplinary Workshop for Undergraduates Participant Summary May 22-26, 2006

Number Unspec Stat/Mat Number of of Home Participants Male Female -ified Faculty Student h Majors Other/Unspecified Colleges/Univ State

Supported 15 8 0 0 23 20 3 16 13 Unsuppted 0 0 0 0 0 0 0 0 0 SAMSI 0 0 0 0 0 0 0 NA NA

SAMSI/CRSC Interdisciplinary Workshop for Undergraduates Workshop Participants May 22-26, 2006

Last Name First Name Gender Affiliation Major/Department Status

Acres Jonas M Colorado State U Mathematics S

Mathematics and Brady Bridget F Millersville U S Biology State U of New York, Economics, Chan Chi Ho (Anson) M S Buffalo Mathematics Mathematics, Cho Thummim M Cornell U S Economics Mathematics Dornan Grant M Colorado State U S (Actuarial Science) Mathematics & Ferrulli Regal M Missouri State U S Physics Mathematics, Gross Peter M U of Arizona S Economics Shuo Guan F Duke U Economics S (Stephanie) Ranganai Mathematics, Gwati F Reed College S (Ranga) Economics

Ho Ying Chiat M Duke U Computer Science S

Georgia Institute of Indrei Emanuel M Applied Mathematics S Technology

Kinser Adam M U of North Florida Statistics S

North Carolina State Kleiner Kristoph M Mathematics S U

281 Kraft Michelle F Kansas State U Mathematics S

Lou Kit Chun (Alice) F Columbia College Mathematics S

Physics, Neyer Mark P. M Xavier U Mathematics, S Computer Science Mathematics, Osborn Jacob M Xavier U S Computer Science Computational Perkins Alex M U of Tennessee S Ecology Mathematics, Rehm Keri F Meredith College S Religion

Smith Terry M Shippensburg U Mathematics S

Stokes Nathan M Colorado State U Mathematics S

Tripoli Jennifer F Meredith College Mathematics S

Applied Mathematics, North Carolina State Zapata Cheryl F Biomedical S U Engineering

12th Annual Conference for African American Researchers in the Mathematical Sciences Participant Summary June 20-23, 2006

New Number of Number of Unspec- Faculty/ Researcher/ Home Home Participants Male Female ified Professional Student Stat Math Other Institution State

Supported 1 0 0 0 1 0 0 1 1 0 Unsuppted 48 19 0 37 30 2 35 27 37 19 SAMSI 4 0 0 1 3 0 4 0 NA NA

282

12th Annual Conference for African American Researchers in the Mathematical Sciences Workshop Participants June 20-23, 2006

Last Name First Name Gender Affiliation Department Status

Adams III Chase M Howard U Mathematics NRG

Akogwu Onobu M Princeton U Electrical Engineering NRG

Ammons Edsel M U of North Dakota Physics FP

SAMSI & U of North Apte Amit M Mathematics FP Carolina-Chapel Hill

Ashley Caleb M Howard U Mathematics NRG

Banks Brian M BAC P FP

Georgia Institute of Industrial & Systems Barnes Earl M FP Technology Engineering

283

U of Wisconsin- Electrical and Braden Yakira F NRG Madison Computer Engineering

Carr Kareem M Illinois State U Physics NRG

Farrah U of North Carolina- Mathematics and Chandler F FP Jackson Wilmington Statistics

Chukwu Ethelbert N. M North Carolina State U Mathematics FP

Mathematics and North Carolina A&T Clemence Dominic P M Institute for Public FP State U Health National Security Currie Melvin M FP Agency North Carolina A&T Dent Gelonia F Engineering FP State U

Dixon Anthony M North Carolina State U Mathematics NRG

U of North Carolina- Eberlein Patrick M Mathematics FP Chapel Hill

Forbes Jeffrey M Duke U Computer Science FP

North Carolina A&T Frederiksen Kurt M Mathematics FP State U Epidemiology and Ghebremichael Musie M Yale U NRG Public Health Physical And Gracien- Katina F North Carolina State U Mathematical NRG Orelien Sciences

Grant Angela F Northwestern U Mathematics FP

National Institute of Statistical Engineering Hagwood Charles M FP Standards and Technology Division Operations Research Hampshire Robert M Princeton U & Financial FP Engineering Mathematics and Harris Clark Leona F Bennett College FP Computer Science Mathematics & North Carolina A&T Haynes Denise F Institute for Public FP State U Health

Hill Raquel F Indiana U Computer Science FP

284

Horne Rudy M Florida State U Mathematics FP

School of Houston Johnny M Elizabeth City State U Mathematics, Science FP & Technology Mathematics and Jackson Monica F American U FP Statistics Fuqua School of Jennings Otis M Duke U FP Business

Jones Christopher M SAMSI FP

Kane Abdoul M U of Toronto Physiology FP

Mathematics and Kennedy Stephen M Carleton College FP Computer Science

King Donald M Northeastern U Mathematics FP

Mathematics & North Carolina A&T Landvater Shannon F Institute for Public NRG State U Health Operations Research Lewis Mark E. M Cornell U and Industrial FP Engineering Operations Research Massey William M Princeton U and Financial FP Engineering U of North Carolina- Mathematics and McMurray Nolan M FP Wilmington Statistics

Middleton Jon M U of Buffalo-SUNY Mathematics NRG

North Carolina A&T Business Morgan Shona F FP State U Administration

Moshesh Irene F Howard U Mathematics NRG

National Security Nelson Valerie F FP Agency

Nkwanta Asamoah M Morgan State U Mathematics FP

Information & Ogoubi Etienne M U of Montreal NRG Operations Research North Carolina A&T Oldham Janis F Mathematics FP State U

285

Onodugo Emeka M IFX Markets NRG

SAMSI & North Pemy Moustapha M Mathematics FP Carolina State U Mathematics and Petters Arlie M Duke U FP Physics

Raphael Derrick M Princeton U Sociology NRG

Mathematics & North Carolina A&T Richardson Ursula C. F Institute for Public NRG State U Health

Rogers Charles M North Carolina State U Mathematics NRG

U of California- Somersille Stephanie F Mathematics NRG Berkeley Mathematics and Stewart Luke M Duke U NRG Computer Science

Stovall Idris M U of Pennsylvania Mathematics FP

North Carolina A&T Tankersley Barbara F Mathematics FP State U

Teguia Alberto M Duke U Mathematics NRG

U of Illinois-Urbana- Tucker Conrad M Engineering NRG Champaign SAMSI & U of North Vernieres Guillaume M Mathematics NRG Carolina-Chapel Hill

Weems Kimberly F North Carolina State U Statistics NRG

Williams Bryan M U of Mississippi Mathematics NRG

Industrial & Operations Williams Damon M U of Michigan NRG Engineering Plasma Physics Williams Kyron M Princeton U NRG Laboratory

Williams Scott M U of Buffalo-SUNY Mathematics NRG

Williamson Keith M Virginia State U Engineering NRG

Wilson Adrian M U of Mississippi Mathematics NRG

286

Claremont McKenna Wilson Ulrica F Mathematics FP College Emory and Henry Winger Aris M Mathematics NRG College

Young Michael M Carnegie Mellon U Computer Science NRG

2006-07 PROGRAM EVENTS AS OF APRIL 4, 2007

™ Development, Assessment and Utilization of Complex Computer Models

Summer School on the Design and Analysis of Computer Experiments (IRMACS, Simon Fraser U) Participant Summary August 11-16, 2006

New Number of Number of Unspec- Faculty/ Researcher/ Home Home Participants Male Female ified Professional Student Stat Math Other Institution State

Supported 19 7 0 10 16 18 5 4 15 10 Unsuppted 23 4 0 19 9 15 2 10 18 10 SAMSI 3 1 0 1 3 1 2 0 NA NA

Summer School on the Design and Analysis of Computer Experiments (IRMACS, Simon Fraser U) Workshop Participants August 11-16, 2006

Last Name First Name Gender Affiliation Department Status

SAMSI and U of Apte Amit M North Carolina- Mathematics NRG Chapel Hill

Bautista Dianne Carrol F Ohio State U Statistics NRG

Statistics and Bayarri M.J. F U of Valencia FP Operations Research

Bengtsson Thomas M Bell Labs Statistics FP

Berger James M SAMSI FP

287 Booker Andrew M Boeing Company FP

Brewster John M U of Manitoba Statistics FP

North Carolina State Chen Tsuei-long M Statistics NRG U Nicholas School of the Courbaud Benoit M Cemagref & Duke U FP Environment

Crooks James M SAMSI NRG

Dancik Garrett M Iowa State U Statistics NRG

Georgia Institute of Industrial and Delaney James M NRG Technology Systems Engineering

Frazier Marian F Ohio State U Statistics NRG

Sandia National Computational Sciences & Gray Genetha F FP Laboratories Mathematics Research Georgia Institute of Guillas Serge M Mathematics FP Technology

Han Gang M Ohio State U Statistics NRG

Hearn Nathan M U of Chicago ASC Flash Center FP

Heaton Matthew M Brigham Young U Statistics NRG

Los Alamos National Heitmann Katrin F FP Laboratory Los Alamos National Higdon Dave M FP Laboratory Institute of Statistics House Leanna F Duke U and Decision NRG Sciences Georgia Institute of Insdustrial Systems Hung Ying M NRG Technology and Engineering Computer Science - Hutter Frank M U of British Columbia NRG Artificial Intelligence

Kumar Arun M Ohio State U Statistics NRG

Lan Yan M U of Michigan Statistics NRG

Los Alamos National Lawrence Earl M Statistical Sciences FP Laboratory

288 U of California-Santa Applied Math & Lee Herbie M FP Cruz Statistics Chunfang Statistics and Acturial Lin F Simon Fraser U NRG Devon Science

Lingwall Jeff M Brigham Young U Statistics NRG

Statistics and Linkletter Crystal F Simon Fraser U NRG Actuarial Science Institute of Statistics Liu Fei F Duke U and Decision NRG Sciences U of British Columbia- Mathematics and Loeppky Jason M FP Okanagan Statistics Institute of Statistics Lunagomez Simon M Duke U and Decision NRG Sciences Mathematics and McLeod Robert M U of Winnipeg FP Statistics

Michailidis George M U of Michigan Statistics FP

Los Alamos National Myers Kary M FP Laboratory

Nagy Bela F U of British Columbia Statistics NRG

Nychka Douglas M NCAR FP

Probability and O'Hagan Tony M U of Sheffield FP Statistics

Paulo Rui M U of Bristol Mathematics NRG

Statistics and Pratola Matthew M Simon Fraser U NRG Actuarial Science Statistics and Ranjan Pritam M Simon Fraser U NRG Actuarial Science U of Alabama- Mathematical Ravindran S.S. M FP Huntsville Sciences

Reese Shane M Brigham Young U Statistics FP

United Technologies/Pratt Reinman Grant M & Whitney Systems Engineering FP Systems and Charles Robinson M U of Virginia Information NRG (Donald) Engineering

289 Sacks Jerry M NISS FP

Santner Thomas M Ohio State U Statistics FP

Spiller Elaine F SAMSI NRG

Sandia National Optimization and Swiler Laura F FP Laboratories Uncertainty Analysis U of California-Santa Applied Math and Taddy Matt M NRG Cruz Statistics

Talluri Rajesh M Texas A&M U Statistics NRG

Welch Will M U of British Columbia Statistics FP

Los Alamos National Williams Brian M FP Laboratory Institute of Statistics Wolpert Robert M Duke U and Decision FP Sciences Statistical Sciences Woods Dave M U of Southampton FP Research Institute

Zhang Aijun M U of Michigan Statistics NRG

Computer Models Opening Workshop Participant Summary September 10-14, 2006

New Number of Number of Unspec- Faculty/ Researcher/ Home Home Participants Male Female ified Professional Student Stat Math Other Institution State

Supported 37 10 0 40 7 20 11 16 32 16 Unsuppted 61 15 0 56 21 28 16 25 29 13 SAMSI 3 2 0 2 6 3 4 1 NA NA

290 Computer Models Opening Workshop Workshop Participants September 10-14, 2006

Last Name First Name Gender Affiliation Department Status

Nicholas School of the Anderson Richard M Duke U NRG Environment North Carolina Center for Research in Banks H. Thomas M FP State U Scientific Computation Mechanical and Bartel Donald L M Cornell U Aerospace FP Engineering

Bautista Dianne Carrol F Ohio State U Statistics NRG

Statistics and Bayarri M.J. F U of Valencia FP Operations Research Massachusetts Bazant Martin M Institute of Mathematics FP Technology

Behringer Bob M Duke U Physics FP

Bengtsson Thomas M Bell Labs Statistics FP

Berliner Mark M Ohio State U Statistics FP

North Carolina Bondell Howard M Statistics FP State U

Borsuk Mark M Dartmouth College Biological Sciences FP

Virginia Polytechnic Center for Applied Burns John M Institute and State FP Mathematics U North Carolina Chen Tsuei-long M Statistics NRG State U North Carolina Choi Jungsoon F Statistics NRG State U

Cintron-Arias Ariel M SAMSI FP

Nicholas School of the Clark James M Duke U FP Environment Duke U and Nicholas School of the Courbaud Benoit M FP Cemagref Environment

291 CTCD/Probability & Cripps Ed M U of Sheffield FP Statistics

Crooks James M SAMSI FP

Cui Tiangang M U of Auckland Engineering Science NRG

Mathematical Cumming Jonathan M Durham U FP Sciences Mechanical and State U of New Dalbey Keith M Aerospace NRG York at Buffalo Engineering North Carolina Daniels Karen F Physics NRG State U North Carolina Marine, Earth & Davis Jerry M FP State U Atmospheric Sciences North Carolina Mathematics and DeVault Kristen F NRG State U CRSC Organismic and Dietze Michael M Harvard U NRG Evolutionary Biology

Dinwoodie Ian M Duke U Statistics FP

National Center for Geophysical Statistics Drignei Dorin M Atmospheric FP Project Research North Carolina Fuentes Montserrat F Statistics FP State U U of Southampton and Gattiker James M Los Alamos National NOC FP Laboratory Fluid Mechanics and Goldhirsch Isaac M Tel-Aviv U FP Heat Transfer Mathematical Goldstein Michael M Durham U FP Sciences Probability and Gosling John-Paul M U of Sheffield FP Statistics

Gotwalt Chris M SAS Institute JMP FP

Sandia National Computational Sciences & Gray Genetha F FP Laboratories Mathematics Research SAMSI and North Greenshtein Eitan M FP Carolina State U North Carolina Gremaud Pierre M Mathematics FP State U

292 Grigoriu Mircea F Cornell U Civil Engineering FP

Nicholas School of the Gronewold Andrew M Duke U NRG Environment U of North Carolina- Guda Swathi M Mathematics NRG Chapel Hill Georgia Institute of Guillas Serge M Mathematics FP Technology

Han Gang M Ohio State U Statistics NRG

Mathematics and Henderson Daniel M Newcastle U FP Statistics Los Alamos Statistical Sciences Higdon Dave M FP National Laboratory Group U of North Carolina- Hohenegger Christel F Mathematics FP Chapel Hill Duke U and House Leanna F Statistics NRG Durham U U of North Carolina- Huang Jingfang M Mathematics FP Chapel Hill Mathematics and Huber Mark M Duke U FP Statistics School of Industrial Georgia Institute of Hung Ying M and Systems NRG Technology Engineering U of California-Los Atmospheric and Ide Kayo F FP Angeles Oceanic Sciences U of North Carolina- Joyce Jennifer F Mathematics NRG Chapel Hill U.S. Environmental Kang Daiwen M ASMD/NERL FP Protection Agency Nicholas School of the Kashuba Roxolana F Duke U NRG Environment

Kaufman Cari F SAMSI FP

National Institute of Last Michael M FP Statistical Sciences Virginia Polytechnic Virginia Bioinformatics Laubenbacher Reinhard M Institute and State FP Institute U U of California- Applied Math and Lee Herbie M FP Santa Cruz Statistics

293 Statistics and Linkletter Crystal F Simon Fraser U NRG Actuarial Science CIIT Centers for Liu Delong M FP Health Research

Liu Fei F Duke U Statistics NRG

Loredo Thomas M Cornell U Astronomy FP

Lunagomez Simon M Duke U Statistics NRG

Mathematics and Ma Chunsheng M Wichita State U FP Statistics

Mallick Bani M Texas A&M U Statistics FP

Mandal Abhyuday M U of Georgia Statistics FP

Information Maniyar Dharmesh M Aston U NRG Engineering Nicholas School of the McMahon Sean M Duke U FP Environment

Michailidis George M U of Michigan Statistics FP

Sandia National Optimization and Mitchell Scott M FP Laboratories Uncertainty Estimation U of North Carolina- Mitran Sorin M Mathematics FP Chapel Hill

Morris Max M Iowa State U Statistics FP

U of North Carolina- Mathematics and Institute Mucha Peter M FP Chapel Hill for Advanced Materials Los Alamos Myers Kary M FP National Laboratory North Carolina Nail Amy F Statistics NRG State U Mathematics and Nichols Nancy F U of Reading FP Meteorology Probability and O'Hagan Tony M U of Sheffield FP Statistics National Center for Climate Change Otto-Bliesner Bette F Atmospheric FP Research Research Harvard School of Paciorek Christopher M Biostatistics FP Public Health

294 Pal Jayanta M SAMSI FP

U of California-San Science Studies Parker Wendy F FP Diego Program State U of New Mechanical and Patra Abani M FP York at Buffalo Aerospace GE Global Patterson Angela F FP Research U of Puerto Rio, Rio Pericchi Luis M Mathematics FP Piedras Campus London School of Perry Mark M Statistics FP Economics State U of New Pitman Bruce M Mathematics FP York at Buffalo U of Wisconsin- Qian Zhiguang M Statistics FP Madison Nicholas School of the Reckhow Ken M Duke U FP Environment North Carolina Reich Brian M Statistics FP State U Eawag/ETH Reichert Peter M Switzerland and FP SAMSI United Reinman Grant M Technologies and FP Pratt & Whitney Systems and Robinson Charles M U of Virginia Information NRG Engineering Mathematical Rougier Jonathan M Durham U FP Sciences

Sacks Jerome M NISS FP

National Center for Geophysical Statistics Sain Stephan M Amtospheric FP Project Research Ohio State U & Santner Thomas M Statistics FP SAMSI North Carolina Schweitzer John M Statistics NRG State U North Carolina Shearer Michael M Mathematics FP State U

Civil and Environ. Engr. and Shoemaker Christine F Cornell U Operations Research FP

295 Statistics and Sitter Randy M Simon Fraser U FP Actuarial Science Centre for the London School of Smith Leonard M Analysis of Time FP Economics Series

Spiller Elaine F SAMSI FP

SAMSI and North Storlie Curtis M Statistics FP Carolina State U

Sun Dongchu M U of Missouri Statistics FP

NOAA & U.S. Swall Jenise F Environmental Protection FP Agency Sandia National Optimization and Swiler Laura F FP Laboratories Uncertainty Estimation U of California- Applied Math and Taddy Matt M NRG Santa Cruz Statistics CUNY Hunter Mathematics and Talih Makram M FP College Statistics

Talluri Rajesh M Texas A&M U Statistics NRG

Global CAE Development Tsai Shih-Chung M General Motors Co. and Integration FP

Vernieres Guillaume M SAMSI FP

Earth and Los Alamos Vrugt Jasper M Environmental FP National Laboratory Sciences Division

West Mike M Duke U Statistics FP

North Carolina White Gentry M Statistics FP State U Mathematics and Wilkinson Darren M Newcastle U FP Statistics North Carolina Williams Cranos M NRG State U

Winsberg Eric M U of South Florida Philosophy FP

Wojtkiewicz Steve M U of Minnesota Civil Engineering FP

Wolpert Robert M Duke U Statistics FP

296 London School of Wynn Henry M Statistics FP Economics North Carolina Yuan Xu Statistics NRG State U Institute of Genome You Lingchong M Duke U FP Sciences and Policy

Zhang Aijun M U of Michigan Statistics NRG

Zhang Tonglin M Purdue U Statistics FP

U of North Carolina- Statistics and Zhu Zhengyuan M FP Chapel Hill Operations Research

CompMod Joint Engineering and Methodology Subprogram Workshop Participant Summary October 26-27, 2006

New Number of Number of Unspec- Faculty/ Researcher/ Home Home Participants Male Female ified Professional Student Stat Math Other Institution State

Supported 3 1 0 3 1 3 1 0 3 3 Unsuppted 9 2 0 7 4 7 1 3 9 5 SAMSI 4 3 0 4 3 4 2 1 NA NA

297 CompMod Joint Engineering and Methodology Subprogram Workshop Workshop Participants October 26-27, 2006

Last Name First Name Gender Affiliation Department Status

Bautista Dianne F Ohio State U Statistics NRG

Statistics and U of Valencia Bayarri Susie F Operations FP and SAMSI Research

Berger James M SAMSI FP

Statistics and Bingham Derek M Simon Fraser U FP Actuarial Science

Cintron-Arias Ariel M SAMSI NRG

Engineering Cui Tiangang M U of Auckland NRG Science

Han Gang M Ohio State U Statistics NRG

Kao Ming-Hung M U of Georgia Statistics NRG

SAMSI and Kaufman Cari F NRG NCAR U of California- Applied Math and Lee Herbie M FP Santa Cruz Statistics

Liu Fei F Duke U Statistics NRG

Mandal Abhyuday M U of Georgia Statistics FP

Los Alamos Statistical Moore Leslie M. (Lisa) F National FP Sciences Group Laboratory Probability and O'Hagan Anthony M U of Sheffield FP Statistics

Pitman Bruce M SUNY-Buffalo Mathematics FP

Eawag and Reichert Peter M FP SAMSI

Sacks Jerome M NISS FP

298 Ohio State U and Santner Thomas M Statistics FP SAMSI Virginia Electrical and Shukla Sandeep M Polytechnic and Computer FP State U Engineering

Spiller Elaine F SAMSI NRG

Wojtkiewicz Steve M U of Minnesota Civil Engineering FP

Wolpert Robert M Duke U Statistical Science FP

CompMod Biosystems Modeling Workshop Workshop Participants March 5-7, 2007

New Number of Number of Unspec- Faculty/ Researcher/ Home Home Participants Male Female ified Professional Student Stat Math Other Institution State

Supported 25 8 1 21 13 9 12 12 20 15 Unsuppted 13 5 4 6 16 6 5 11 7 3 SAMSI 5 1 0 4 2 5 1 0 NA NA

CompMod Biosystems Modeling Workshop Workshop Participants March 5-7, 2007

Last Name First Name Gender Affiliation Department Status

U of Valencia and Statistics and Bayarri M.J. F FP SAMSI Operations Research

Berger James M SAMSI FP

U of California-Santa Statistics and Applied Bonnet Guillaume M FP Barbara Probability Mathematics and Boys Richard M Newcastle U FP Statistics

U.S. Environmental National Center for Breen Michael M NRG Protection Agency Computational Toxicology North Carolina State Breen Miyuki Statistics NRG U Molecular and Brown Kevin M Harvard U FP Cellular Biology

299 U of North Carolina- Brown Kevin M Biostatistics NRG Chapel Hill Bioinformatics Capobianco Enrico M CRS4 FP Laboratory

Childs Lauren F Cornell U Applied Mathematics NRG

Crooks James M SAMSI NRG

U of North Carolina- Biomedical Dang Kristen F NRG Chapel Hill Engineering

Devarapu Anilkumar M U of Louisville Mathematics FP

Dobra Adrian M U of Washington Statistics FP

El Abbasi Abdelilah M Cadi Ayyad U Biology NRG

Fricks John M Pennsylvania State U Statistics FP

Gillespie Daniel M Gillespie Consulting FP

SAMSI and North Greenshtein Eitan M Mathematics FP Carolina State U U of California-Santa Statistics and Applied Harang Richard M NRG Barbara Probability Mathematics and Henderson Daniel M Newcastle U FP Statistics

Isaacson Samuel M U of Utah Mathematics FP

Scripps Research Jaqaman Khuloud F Cell Biology NRG Institute U of North Carolina- Joyce Jennifer F Mathematics NRG Chapel Hill U of Wisconsin- Kang Hye Won Mathematics NRG Madison Program in Genetics Kavanaugh Laura F Duke U NRG and Genomics Mathematics and Ke Weiming M South Dakota State U NRG Statistics

Immunology/Biostatistics & Kepler Thomas M Duke U Bioinformatics FP

U of North Carolina- Kibalnik Anton M Physics NRG Chapel Hill

300 Knudsen Thomas M U of Louisville Birth Defects Center FP

Kou Samuel M Harvard U Statistics FP

Kuo Lynn F U of Connecticut Statistics FP

U of California-Santa Lampoudi Sotiria F Computer Science NRG Barbara Biomedical Lee Tae M Duke U NRG Engineering U.S. Environmental Luke Nicholas M FP Protection Agency U of Wisconsin- Mincheva Maya F Mathematics NRG Madison Statistics and Niemi Jarad M Duke U NRG Decision Sciences Krasnow Institute for Oliveira Rodrigo M George Mason U FP Advanced Study U of North Carolina- Olofsson Helen F Biology FP Chapel Hill

Bioinformatics & Comp Pahle Juergen M EML Research Biochemistry NRG

U of California-Santa Petzold Linda F Computer Science FP Barbara

Popovic Lea F Cornell U Mathematics FP

U of Wisconsin- Qian Zhiguang M Statistics NRG Madison U of North Carolina- Qiao Xingye M Chapel Hill and Statistics NRG SAMSI

Rabitz Herschel M Princeton U Chemistry FP

United Biosource Ranganatha Gayatri FP Corporation

Reed Michael M Duke U Mathematics FP

Rempala Grzegorz A M U of Louisville Mathematics FP

North Carolina State Schweitzer John M Statistics NRG U

Seal Pradipta M Boston U Public Health NRG

301 Tel Aviv U and Statistics and Steinberg David M FP SAMSI Operations Research Biomedical Tan Cheemeng M Duke U NRG Engineering Biomedical Tanouchi Yu Duke U NRG Engineering U of North Carolina- Todd Abby F Mathematics NRG Chapel Hill Georgia Institute of Biomedical Voit Eberhard M FP Technology Engineering Mathematics and Wilkinson Darren M Newcastle U FP Statistics Statistics and Wolpert Robert M Duke U FP Decision Sciences

Yamada Richard M Cornell U Applied Mathematics NRG

U of Wisconsin- Chemical and Biological Yin John M FP Madison Engineering

Zahedi Hamed M U of Louisville Mathematics NRG

Zhang Dabao M Purdue U Statistics NRG

Zhang Min F Purdue U Statistics NRG

U of North Carolina- Zuo Peiying Mathematics NRG Chapel Hill

Joint CompMod-MUCM Mid-Program Workshop Workshop Participants April 2-3, 2007

New Number of Unspec- Faculty/ Researcher/ Home Number of Participants Male Female ified Professional Student Stat Math Other Institution Home State

Supported 7 0 0 5 2 3 2 2 6 2 Unsuppted 10 0 0 9 1 5 3 2 11 4 SAMSI 8 4 0 5 7 8 4 0 NA NA

302 Joint CompMod-MUCM Mid-Program Workshop Workshop Participants April 2-3, 2007

Last Name First Name Gender Affiliation Department Status

London School of Bates Ron M Statistics FP Economics U of Valencia and Statistics and Bayarri Susie F FP SAMSI Operations Research

Berger James M SAMSI FP

Center for Research SAMSI and North Cintron Ariel M in Scientific NRG Carolina State U Computation

Crooks James M SAMSI NRG

State U of New York Mechanical and Dalbey Keith M NRG at Buffalo Aerospace Eng

Gattiker James M Southampton U MUCM FP

Probability and Gosling John Paul M U of Sheffield NRG Statistics Georgia Institute of Guillas Serge M Mathematics FP Technology

Kaufman Cari F SAMSI NRG

U of California-Santa Applied Math and Lee Herbie M FP Cruz Statistics

Liu Fei F Duke U and SAMSI Statistics NRG

Loredo Thomas M Cornell U Astronomy FP

SAMSI and Wichita Ma Chunsheng M FP State U London School of Maruri-Aguilar Hugo M Statistics NRG Economics Probability and O'Hagan Anthony M U of Sheffield FP Statistics U at Buffalo-SUNY Mechical and Patra Abani M FP and NSF Aerospace Eng ISEG, Technical U of Paulo Rui M Mathematics FP Lisbon

303 Pitman Bruce M U at Buffalo-SUNY Mathematics FP

Reichert Peter M Eawag and SAMSI FP

SAMSI and U of Rougier Jonathan M NRG Bristol

Sacks Jerome M NISS FP

Spiller Elaine F SAMSI NRG

Tel Aviv U and Statistics and Steinberg David M FP SAMSI Operations Research SAMSI and North White Gentry M Statistics NRG Carolina State U Mathematics and Wilkinson Darren M Newcastle U FP Statistics

Wolpert Robert M Duke U Statistical Science FP

London School of Wynn Henry M Statistics FP Economics U of North Carolina- Statistics and Zhu Zhengyuan M FP Chapel Hill Operations Research

304

CompMod Terrestrial Mid-Program Meeting Workshop Participants April 4, 2007

New Number of Number of Unspec- Faculty/ Researcher/ Home Home Participants Male Female ified Professional Student Stat Math Other Institution State

Supported 4 1 0 3 2 0 0 5 5 4 Unsuppted 4 3 1 2 6 0 0 8 3 2 SAMSI 2 1 0 1 2 2 1 0 NA NA

CompMod Terrestrial Mid-Program Meeting

Workshop Participants

April 4, 2007

Last Name First Name Gender Affiliation Department Status

Alley Kerensa F Wake Forest U NRG

Bell Dave M Duke U NRG

Environment and Clark Jim M Duke U FP Earth Sciences Cemagref and Duke Environment and Courbaud Benoit M NRG U Earth Sciences

305

Crooks James M SAMSI NRG

Organismic and Dietze Michael M Harvard U NRG Evolutionary Biology Tropical Forest Science, Feeley Kenneth M Harvard U Arnold Arboretum Asia FP Prog

Hersh Michelle F Duke U Biology NRG

Kaufman Cari F SAMSI NRG

Environment and Metcalf Jessica F Duke U NRG Earth Sciences U of Bristol and Rougier Jonathan M Mathematics FP SAMSI

National Center for Sain Stephan M Amtospheric Research FP

Salk Carl M Duke U NRG

Silman Miles M Wake Forest U Biology FP

Uriarte Maria F Columbia U Ecology, Evolution FP

Environment and Wu Wei Duke U NRG Earth Sciences

™ High Dimensional Inference and Random Matrices

Random Matrices Opening Workshop Participant Summary September 17-20, 2006

New Number of Number of Unspec- Faculty/ Researcher/ Home Home Participants Male Female ified Professional Student Stat Math Other Institution State

Supported 45 11 0 41 15 23 24 9 36 21 Unsuppted 72 13 0 69 16 48 27 10 36 15 SAMSI 4 1 0 1 4 3 2 0 NA NA

306 Random Matrices Opening Workshop Workshop Participants September 17-20, 2006

Last Name First Name Gender Affiliation Department Status

Ahn Jeongyoun F U of Georgia Statistics FP

Airoldi Edo M Carnegie Mellon U Computer Science NRG

U of Alabama- Baker Steven M Applied Mathematics NRG Birmingham California Polytechnic Basor Estelle F Mathematics FP State U Computer Science Belkin Mikhail M Ohio State U FP and Engineering

Belov Sergei M Duke U Mathematics NRG

U of California- Bickel Peter M Statistics FP Berkeley Mathematical Bryc Wlodzimierz M U of Cincinnati FP Sciences U of North Carolina- Bu Sunyoung F Mathematics NRG Chapel Hill U of North Carolina- Statistics and Budhiraja Amarjit M FP Chapel Hill Operations Research

Bunea Florentina F Florida State U Statistics FP

U of North Carolina- Statistics and Cao Hongyuan F NRG Chapel Hill Operations Research Statistics and Carvalho Carlos M Duke U FP Decision Sciences

Casella George M U of Florida Statistics FP

Bell Labs and Lucent Communications and Chen Aiyou M FP Technologies Statistical Sciences Statistics and Cheng Guang M Duke U FP Decision Sciences

Choup Leonard M U of California-Davis Mathematics NRG

Statistics and Chu Jen-hwa M Duke U NRG Decision Sciences

307 U of California- Coehlo Nate M Statistics FP Berkeley

Crooks Jim M SAMSI FP

Systems and Davis Ginger F U of Virginia Information FP Engineering

Dey Dipak M U of Connecticut Statistics FP

Dobra Adrian M U of Washington Statistics FP

Donoho David M Stanford U Statistics FP

Dumitriu Ioana F U of Washington Mathematics FP

Dunson David M NIEHS & Duke Biostatistics Branch FP

Massachusetts Institute of Edelman Alan M Technology & Interactive Mathematics FP Supercomputing U of California- El Karoui Noureddine M Statistics FP Berkeley

Ensor Kathy F Rice U Statistics FP

Ercolani Nicholas M U of Arizona Mathematics FP

U of North Carolina- Fan Jianping M Computer Science FP Charlotte Operations Research Fan Jianqing M Princeton U and Financial FP Engineering Operations Research Fan Yingying F Princeton U and Financial NRG Engineering Statistics and Gao Zhenglei F Duke U NRG Decision Sciences North Carolina State Ghosal Subhashis M Statistics FP U

Gioev Dimitri M U of Rochester Mathematics FP

Goetze Friedrich M Bielefeld U Mathematics FP

SAMSI and North Greenshtein Eitan M FP Carolina State U

308 Matematisk Fysik, Guhr Thomas M Lunds Universitet FP LTH Georgia Institute of Guillas Serge M Mathematics FP Technology

Haugh Andrew M U College Cork NRG

Nicholas School of Hegerl Gabriele F Duke U FP the Environment U of California- Holtz Olga F Mathematics FP Berkeley Georgia Institute of Houdre Christian M Mathematics FP Technology NIBHI, School of Hoyle David M U of Manchester FP Medicine

Huang Jianhua M Texas A&M U Statistics FP

Mathematics and Statistics Huber Mark M Duke U and Decision Sciences FP

Georgia Institute of Industrial and Huo Xiaoming M FP Technology Systems Engineering North Carolina State Ipsen Ilse F Mathematics FP U

Johnstone Iain M Stanford U Statistics FP

Massachusetts Koev Plamen M Institute of Mathematics FP Technology U of Missouri- Kolenikov Stanislav M Statistics FP Columbia

Konno Yoshihiko M Japan Women's U Facutly of Science FP

SAMSI and U of Krishnapur Manjunath M North Carolina- Statistics FP Chapel Hill

Lafferty John M Carnegie Mellon U Computer Science FP

National Institute of Last Michael M FP Statistical Sciences U of North Carolina- Statistics and Lee Chihoon M NRG Chapel Hill Operations Research Statistical and Lee Kwan M GlaxoSmithKline FP Quantitative Sciences

309 U of North Carolina- Statistics and Lee Myung Hee F NRG Chapel Hill Operations Research

Lee Yoonkyung M Ohio State U Statistics FP

LeFew William M Duke U Mathematics NRG

U of California- Lei Jing F Statistics NRG Berkeley

Levina Elizaveta F U of Michigan Statistics FP

North Carolina State Li Lexin M Statistics FP U Florida International Li Tao M Computer Science FP U North Carolina State Lin Min-Hsiung M Mathematics NRG U

Lin Rongheng M NIEHS Biostatistics FP

Lin Xiaodong M U of Cincinnati Mathematics FP

Georgia Institute of Litherland Trevis M Mathematics NRG Technology CIIT Centers for Liu Delong M FP Health Research U of North Carolina- Statistics and Liu Yufeng M FP Chapel Hill Operations Research North Carolina State Lu Wenbin M Statistics FP U

Lv Jinchi M Princeton U Mathematics NRG

Mathematics and Ma Chunsheng M Wichita State U FP Statistics Universite du Quebec MacGibbon Brenda F Mathematics FP a Montreal

Maggioni Mauro M Duke U Mathematics FP

Markatou Marianthi F Columbia U Biostatistics FP

U of North Carolina- Statistics and Marron J. S. M FP Chapel Hill Operations Research Bell Labs and Lucent Communications and Marzetta Thomas M FP Technologies Statistical Sciences

310 Mathematics and Massam Helene F York U FP Statistics

Mattingly Jonathan M Duke U Mathematics FP

Metcalfe Anthony M U College Cork Mathematics NRG

Miller Peter M U of Michigan Mathematics FP

Mathematics and Mingo James A. M Queen's U FP Statistics M.D. Anderson Mueller Peter M Biostatistics FP Cancer Center Statistics and Mukherjee Shayan M Duke U FP Decision Sciences Computer Science Weizmann Institute of Nadler Boaz M and Applied FP Science Mathematics

Najim Jamal M CNRS - ENST FP

National Center for Nychka Douglas M Atmospheric IMAGe FP Research Mathematical Oraby Tamer M U of Cincinnati NRG Sciences Statistics and Ouyang Zhi M Duke U NRG Decision Sciences

Pal Jayanta M SAMSI FP

Park Cheolwoo M U of Georgia Statistics FP

Park Junyong M Purdue U Statistics NRG

Peche Sandrine F U of Grenoble 1 Mathematics FP

North Carolina State Pelletier Denis M Economics FP U

Perry Patrick M Stanford U Statistics NRG

Pourahmadi Mohsen M Northern Illinois U Division of Statistics FP

U of North Carolina- Statistics and Qiao Xingye M NRG Chapel Hill Operations Research

311 Rajaratnam Bala M Cornell U Statistics FP

Statistics and Rambharat Ricky M Duke U FP Decision Sciences U of Washington and Fred Biostatistics and Randolph Tim M Hutchinson Cancer FP Research Center Biomathematics Massachusetts Electrical Engineering Rao Raj M Institute of and Computer NRG Technology Science Mathematics and Rashidi Far Reza M Queen's U NRG Statistics U of Alabama- Mathematical Ravindran S.S M FP Huntsville Sciences

Richards Donald M Pennsylvania State U Statistics FP

Rider Brian M U of Colorado Mathematics FP

Hebrew U of Ritov Ya'acov (Jacob) M Statistics FP Jerusalem

Rumanov Igor M U of California-Davis Mathematics NRG

Statistics and Schmidler Scott M Duke U FP Decision Sciences

Schoolfield Clyde M U of Florida Statistics FP

Schwartzman Armin M Stanford U Statistics NRG

North Carolina State Selee Teresa F Mathematics NRG U

Sethuraman Jayaram M Florida State U Statistics FP

Sethuraman Sunder M Iowa State U Mathematics FP

North Carolina State Sharma Dhruv M Statistics NRG U U of North Carolina- Statistics and Shen Haipeng M FP Chapel Hill Operations Research Mathematics and Shin Hyejin M Auburn U FP Statistics North Carolina State Silverstein Jack M Mathematics FP U Massachusetts Smith Steven M Lincoln Laboratory FP Institute of

312 Technology

Mathematics and Speicher Roland M Queen's U FP Statistics

Spiller Elaine F SAMSI FP

North Carolina State Stefanski Leonard M Statistics FP U

Sun Dongchu M U of Missouri Statistics FP

Randolph-Macon Sutton Brian M Mathematics FP College Mathematics and Talih Makram M CUNY Hunter College FP Statistics

Tracy Craig M U of California-Davis Mathematics FP

U of North Carolina- Truong Young M Biostatistics FP Chapel Hill

Venakides Stephanos M Duke U Mathematics FP

U of California- Vu Vincent M Statistics NRG Berkeley

Wegkamp Marten M Florida State U Statistics FP

Wells Martin M Cornell U Statistical Science FP

U of California-Santa Widom Harold M Mathematics FP Cruz Statistics and Wolpert Robert M Duke U FP Decision Sciences Operations Research Wu Yichao M Princeton U and Financial FP Engineering Georgia Institute of Xu Hua M Mathematics NRG Technology U of California- Yu Bin F Statistics FP Berkeley U of California-Santa Applied Math and Zantedeschi Daniel M NRG Cruz Statistics

Zeitouni Ofer M U of Minnesota Mathematics FP

North Carolina State Zhang Hao F Statistics FP U

313 U of North Carolina- Statistics and Zhang Lingsong M NRG Chapel Hill Operations Research

Zhao Ou F U of Michigan Statistics NRG

U of North Carolina- Zhao Yufan M Biostatistics NRG Chapel Hill U of North Carolina- Zhu Hongtu M Biostatistics FP Chapel Hill

Zhu Ji M U of Michigan Statistics FP

U of North Carolina- Statistics and Zhu Zhengyuan M FP Chapel Hill Operations Research

Random Matrices Bayesian Focus Week Participant Summary October 30-November 3, 2006

New Number of Number of Unspec- Faculty/ Researcher/ Home Home Participants Male Female ified Professional Student Stat Math Other Institution State

Supported 18 5 0 12 11 11 7 5 18 11 Unsuppted 16 3 0 11 8 12 4 3 12 3 SAMSI 6 1 0 4 3 6 1 0 NA NA

Random Matrices Bayesian Focus Week Workshop Participants October 30-November 3, 2006

Last Name First Name Gender Affiliation Department Status

Airoldi Edoardo M Carnegie Mellon U Computer Science NRG

Ecole des Mines de Bach Francis M FP Paris U of Valencia and Statistics and Bayarri Susie F FP SAMSI Operations Research

Berger James M SAMSI FP

Wroclaw U of Mathematics and Bogdan Malgorzata F FP Technology Computer Science

314 Statistics and Carvalho Carlos M Duke U NRG Decision Sciences U of Marlyland- Mathematics and Choi Taeryon M NRG Baltimore County Statistics Statistics and Clyde Merlise F Duke U FP Decision Sciences

DiCiccio Thomas M Cornell U Social Statistics FP

Dobra Adrian M U of Washington Statistics FP

Drton Mathias M U of Chicago Statistics NRG

Operations Research Fan Yingying F Princeton U and Financial NRG Engineering North Carolina State Ghosh Sujit M Statistics FP U SAMSI and North Greenshtein Eitan M FP Carolina State U U of California- Griffiths Tom M Psychology FP Berkeley

Inoue Lurdes F U of Washington Biostatistics FP

Mathematical Jiang Dongming F U of Cincinnati NRG Sciences North Carolina State Jing Naihuan M Mathematics FP U

Johnstone Iain M Stanford U Statistics FP

Los Alamos National Applied Modern Jun Sung M FP Laboratory Physics Group

Krishnapur Majunath M SAMSI NRG

Last Michael M NISS NRG

Virginia Polytechnic Li Zhonggai M Statistics NRG Institute and State U

Liang Gang M U of California-Irvine Statistics NRG

Lin Xiaodong M U of Cincinnati Mathematics NRG

Mathematics and Liu Jinnan F York U NRG Statistics

315 North Carolina State Liu Peng M Statistics NRG U U of North Carolina- Statistics and Liu Yufeng M NRG Chapel Hill Operations Research SAMSI and Wichita Ma Chunsheng M Statistics FP State U Ecole Superieure de la Statistique et de Malouche Dhafer M Statistics FP l'Analyse d'Information Mathematics and Massam Hélène F York U FP Statistics Statistics and Mukherjee Shayan M Duke U FP Decision Sciences Probability and O'Hagan Anthony M U of Sheffield FP Statistics

Pal Jayanta M SAMSI NRG

Pourahmadi Mohsen M Northern Illinois U Statistics FP

Rajaratnam Bala M SAMSI NRG

Mathematics and Rashidi Far Reza M Queen's U NRG Statistics

Richardson Thomas M U of Washington Statistics FP

Roverato Alberto M U of Bologna Statistics FP

Mathematical Sivaganesan Siva M U of Cincinnati FP Sciences U of Missouri- Sun Dongchu M Statistics FP Columbia U of North Carolina- Statistics and Sun Xing M NRG Chapel Hill Operations Research CUNY-Hunter Mathematics and Talih Makram M NRG College Statistics Chalmers and Mathematical Wermuth Nanny F Gothenburg FP Statistics Universities Statistics and West Mike M Duke U FP Decision Sciences 95th Civil Affairs Civil Information Woolery David M FP Brigade, US Army Managment

316 Operations Research Wu Yichao M Princeton U and Financial NRG Engineering Statistics and Zhang Liang M Duke U NRG Decision Sciences U of North Carolina- Zhao Yufan M Biostatistics NRG Chapel Hill

RanMat Workshop on Large Graphical Models and Random Matrices Participant Summary November 9-11, 2006

New Number of Number of Unspec- Faculty/ Researcher/ Home Home Participants Male Female ified Professional Student Stat Math Other Institution State

Supported 7 6 0 9 4 8 2 3 12 4 Unsuppted 7 2 0 4 5 2 4 3 8 2 SAMSI 1 0 0 1 0 1 0 0 NA NA

RanMat Workshop on Large Graphical Models and Random Matrices Workshop Participants November 9-11, 2006

Last Name First Name Gender Affiliation Department Status

Rebecca Mathematics and Ali F U of Guelph NRG Ayesha Statistics

Berger James M SAMSI FP

Cox David M Nuffield College FP

de Luna Xavier M Umeå U Statistics FP

Dempster Arthur M Harvard U Statistics FP

317 Eberhardt Frederick M Carnegie Mellon U Philosophy NRG

Ferkingstad Egil M U of Oslo Biostatistics FP

Frangakis Constantine M Johns Hopkins U Biostatistics FP

Gottard Anna F U of Florence Statistics FP

Hammar Oscar M Gothenburg U NRG

Hansen Ben M U of Michigan Statistics NRG

North Carolina State Jing Naihuan M Mathematics FP U Eindhoven U of Mathematics and Kuhnt Sonja F FP Technology Computer Science North Carolina State Liu Peng M Statistics NRG U

Statistics and Liu Rong F Michigan State U NRG Probability Applied Economics Luo June F Clemson U NRG and Statistics

Marchetti Giovanni M U of Florence Statistics FP

Mathematics and Massam Hélène F York U FP Statistics Economics, Finance Stanghellini Elena F U of Perugia FP and Statistics CUNY - Hunter Mathematics and Talih Makram M NRG College Statistics Chalmers and Mathematical Wermuth Nanny F FP Gothenburg U Statistics Centre for Survey Wiedenbeck Michael M Research and Methodology FP (ZUMA) U of North Carolina- Zhao Yufan M Biostatistics NRG Chapel Hill

318 Geometry, Random Matrices and Statistical Inference Participant Summary January 16-19, 2007

New Number of Unspec- Faculty/ Researcher/ Home Number of Participants Male Female ified Professional Student Stat Math Other Institution Home State

Supported 11 2 1 7 7 9 5 0 13 10 Unsuppted 13 4 0 8 9 7 4 6 9 5 SAMSI 3 0 0 2 1 1 2 0 NA NA

Geometry, Random Matrices and Statistical Inference Workshop Participants January 16-19, 2007

Last Name First Name Gender Affiliation Department Status

Computer Science Belkin Misha M Ohio State U NRG and Enigneering

Bendich Paul M Duke U Mathematics NRG

George Washington Bura Efstathia F Statistics FP U U of North Carolina- Cao Hongyuan F Statistics NRG Chapel Hill Mathematical Damelin Steven M Georgia Southern U FP Sciences U of California-San Dasgupta Sanjoy M Computer Science FP Diego

Greenshtein Eitan M SAMSI FP

North Carolina State Jing Naihuan M Mathematics FP U Georgia Institute of School of Koltchinskii Vladimir M FP Technology Mathematics

Lebanon Guy M Purdue U Statistics FP

Lee Yoonkyung F Ohio State U Statistics FP

Levina Liza F U of Michigan Statistics FP

Liang Feng F Duke U Statistics FP

319 Mathematical Lin Xiaodong M U of Cincinnati NRG Sciences

Lunagomez Simon M Duke U Statistics NRG

SAMSI & Wichita Mathematics and Ma Chunsheng M FP State U Statistics

Maggioni Mauro M Duke U Mathematics FP

Mileyko Yuriy M Duke U Computer Science NRG

Morozov Dmitriy M Duke U Computer Science NRG

Mukherjee Sayan M Duke U Statistics FP

Computer Science Weizmann Insititute Nadler Boaz M and Applied NRG of Science Mathematics

Niyogi Partha M U of Chicago Computer Science FP

National Cancer Pfeiffer Ruth F Biostatistics Branch FP Institute, NIH U of Wisconsin- Qian Zhiguang M Statistics NRG Madison U of North Carolina- Statistics and Qiao Xingye M Chapel Hill and NRG Operations Research SAMSI

Qin Yingli Iowa State U Statistics NRG

U of California- Rocha Guilherme M Statistics NRG Berkeley

Rothman Adam M U of Michigan Statistics NRG

Harvard School of Schwartzman Armin M Biostatistics FP Public Health

Shi Tao M Ohio State U Statistics FP

Singer Amit M Yale U Mathematics NRG

CUNY-Hunter Mathematics and Talih Makram M NRG College Statistics Université de Mathématiques et de Taylor Jonathan M NRG Montréal Statistique

320 Statistics and Wu Qiang M Duke U NRG Decision Sciences

™ Summer Program on Multiplicity and Reproducibility in Scientific Studies

Multiplicity & Reproducibility in Scientific Studies Opening Workshop Participant Summary July 10-12, 2006

New Number of Number of Unspec- Faculty/ Researcher/ Home Home Participants Male Female ified Professional Student Stat Math Other Institution State

Supported 6 7 0 11 2 13 0 0 10 6 Unsuppted 32 12 0 36 8 18 0 15 23 10 SAMSI 2 1 0 1 2 3 0 0 NA NA

Multiplicity & Reproducibility in Scientific Studies Opening Workshop Workshop Participants July 10-12, 2006

Last Name First Name Gender Affiliation Department Status

Bristol-Myers Squibb Exploratory Arani Ramin M FP Company Development, GBS Statistics and Bayarri Susie F U of Valencia FP Operations Research

Berger Jim M SAMSI FP

U of Texas M. D. Biostatistics and Berry Donald M Anderson Cancer FP Applied Mathematics Center

Cao Jing F Southern Methodist U Statistical Science FP

Chiswell Karen F GlaxoSmithKline Statistical Sciences FP

Crooks James M SAMSI FP

SAMSI and North Denogean Lisa F Statistics FP Carolina State U German Diabetes Institute of Biometrics Dickhaus Thorsten M FP Center and Epidemiology

321 Dunson David M NIEHS and Duke U Statistics FP

German Diabetes Institute of Biometrics Finner Helmut M FP Center and Epidemiology North Carolina State Ghosal Subhashis M Statistics FP U Johns Hopkins Goodman Steve M Biostatistics FP Medical Institutions U of Texas M.D. Biostatistics and Guindani Michele M Anderson Cancer FP Applied Mathematics Center Center for Human Hauser Elizabeth F Duke U FP Genetics U of North Carolina- Herring Amy F Biostatistics FP Chapel Hill Statistics and Hoff Peter M U of Washington FP Biostatistics Sociology and U of North Carolina- Hull James M Carolina Population NRG Chapel Hill Center Center for Devices and Division of Irony Telba F FP Radiological Health - FDA Biostatistics

Jang Woncheol M Duke U Statistics FP

Jin Jiashun M Purdue U Statistics FP

U of Texas M.D. Biostatistics and Johnson Valen M Anderson Cancer FP Applied Mathematics Center U of Wisconsin- Kendziorski Christina F Biostatistics FP Madison U of North Carolina- Kosorok Michael M Biostatistics FP Chapel Hill

Krishen Alok M GlaxoSmithKline FP

U of North Carolina- Lin Danyu M Biostatistics FP Chapel Hill CIIT Centers for Liu Delong M FP Health Research

Liu Fei F Duke U Statistics NRG

Duke Clinical Lokhnygina Yuliya F FP Research Institute

322 Lunagomez Simon M Duke U Statistics NRG

Luo Jingqin F Duke U Statistics NRG

MacLehose Richard M NIEHS Biostatistics Branch NRG

Statistics and Madar Vered F Tel-Aviv U NRG Operations Research U.S. Environmental ORD/NCEA/IO/IRIS Marcus Allan M FP Protection Agency Program U of Texas M.D. Mueller Peter M Anderson Cancer Biostatistics FP Center

Newton Michael M U of Wisconsin Statistics FP

Outcomes Research, Obenchain Robert L. (Bob) M Eli Lilly and Company FP US Medical U.S. Food and Drug O'Neill Robert M CDER/OTS/OB FP Administration Ottawa Health Clinical Epidemiology O'Rourke Keith M FP Research Institute Program

Ouyang Zhi M Duke U Statistics NRG

U of Missouri- Qiu Jing F Statistics FP Columbia

Rice Kenneth M U of Washington Biostatistics FP

Rodriguez Abel M Duke U Statistics NRG

U of Texas M.D. Rosner Gary M Anderson Cancer Biostatistics FP Center

Sang Huiyan F Duke U Statistics NRG

Sarkar Sanat M Temple U Statistics FP

Schwartz Scott M Duke U Statistics NRG

Scott James M Duke U Statistics NRG

U of California- Shaffer Juliet F Statistics FP Berkeley

323 Sivaganesan Siva M U of Cincinnati FP

Public Health Sun Lei F U of Toronto Sciences and FP Statistics Biostatistics & Tadesse Mahlet F U of Pennsylvania FP Epidemiology Neurology; Georgetown U School Tractenberg Rochelle F Biostatistics; FP of Medicine Psychiatry

Umbach David M NIEHS Biostatistics Branch FP

U of North Carolina- Von Holle Ann F Psychiatry FP Chapel Hill

Wasserman Larry M Carnegie Mellon U Statistics FP

Statistics and Yekutieli Daniel M Tel Aviv U FP Operations Research National Institute of Young Stan M FP Statistical Sciences U of Texas M.D. Biostatistics and Zhang Song M Anderson Cancer FP Applied Math Center

Zheng Xinge F Purdue U Statistics NRG

™ Education and Outreach

Industrial Mathematical & Statistical Modeling Workshop for Graduate Students Participant Summary July 24-August 1, 2006

Number Unspec- Stat/Math Number of of Home Participants Male Female ified Faculty Student Majors Other/Unspecified Colleges/Univ State

Supported 17 19 0 2 34 34 1 25 17 Unsuppted 1 1 0 0 2 1 1 2 2 SAMSI 0 0 0 0 0 0 0 NA NA

324

Industrial Mathematical & Statistical Modeling Workshop for Graduate Students Workshop Participants July 24-August 1, 2006

Last Name First Name Gender Affiliation Major/Department Status

Baker Aditi F Montana State U Mathematics S

U of Alabama- Baker Steven (Jeff) M Applied Mathematics S Birmingham

Barnard Richard M Louisiana State U Mathematics S

Belov Sergei M Duke U Mathematics S

Chen Ye M Clarkson U Mathematics S

Christov Ivan M Texas A&M U Mathematics S

Rensselaer Dediu Simona F Mathematics S Polytechnic Institute

Diver Paul M Georgetown U Chemistry S

Fertig Elana F U of Maryland Mathematics S

Gabrys Robertas M Utah State U Mathematics S

Geng Weihua M Michigan State U Mathematics S

Guevara Alvaro M Louisiana State U Mathematics S

Aneesh Hariharan M Auburn U Mathematics S Shankar

Hritcu Roxana O. S. F Ohio U Mathematics S

Jung Minkyung F Indiana U Mathematics S

King David M Arizona State U Mathematics S

Koskodan Rachel F Texas Tech U Mathematics S

325 U of Alabama- Kulkarni Mandar M Mathematics S Birmingham

Langville Amy F College of Charleston Mathematics A

Law Wai(Jenny) F Duke U Mathematics S

Lee Chung-min F Indiana U Mathematics S

U of North Carolina- Li Jing F Applied Mathematics S Charlotte North Carolina State Li Wanying F Statistics S U U of Colorado- Mao Wenjin F Applied Mathematics S Boulder

Maslova Inga F Utah State U Mathematics S

Morowitz Brent M Georgetown U Mathematics S

Morton Maureen F Michigan State U Mathematics S

New Jersey Institute Murisic Nebojsa M Mathematical Science S of Technology

Owens Luke M U of South Carolina Mathematics S

U of North Carolina- Strychalski Wanda F Mathematics S Chapel Hill Mathematics and Turkmen Asuman F Auburn U S Statistics U of Colorado- Wang Jian F Applied Mathematics S Boulder

Wang Ting M Kansas State U Mathematics S

Ward Carrie F Advertising.com A

Wu Qi F U of South Carolina Mathematics S

Zhang Xiaozhou(Joe) M Purdue U Statistics S

Electrical and Zhao Gang M U of Louisville Computer S Engineering

Zuo Miao F Arizona State U Mathematics S

326 Undergraduate Two-Day Workshop Participant Summary November 17-18, 2006

New Number of Number of Unspec- Faculty/ Researcher/ Home Home Participants Male Female ified Professional Student Stat Math Other Institution State

Supported 11 9 0 1 19 1 14 5 15 12 Unsuppted 5 1 0 4 2 2 4 0 6 6 SAMSI 3 2 0 2 3 1 4 0 NA NA

Undergraduate Two-Day Workshop Workshop Participants November 17-18, 2006

Last Name First Name Gender Affiliation Major/Department Status

Loyola College- Alban Douglas M Mathematics S Baltimore

Bajulaiye Olaniyi M Benedict College Computer Science S

Biotidara Jerome M Benedict College Electrical Engineering S

Loyola College- Celso Jonathan M Mathematical Science S Baltimore Mathematics and Cho Thummim M Cornell U S Economics

Debes Rachel F U of Kansas Mathematics S

California State U- Applied Mathematics Dennis Justin M S Chico and Statistics

Douglas Emmeline F College of Charleston Mathematics S

North Carolina A&T Applied Mathematics Edwards Angela F S State U and Economics U of California- El Karoui Noureddine M Statistics A Berkeley and SAMSI Winston Salem State Eure Amanda F Mathematics S U SAMSI and North Center of Research in Greenshtein Eitan M A Carolina State U Scientific Computing Georgia Institute of Guillas Serge M Technology and Mathematics A SAMSI

327 Computer Science Hamlette Tamika F Norfolk State U S and Mathematics Georgia Institute of Hull Patrick M Applied Mathematics S Technology U of Colorado- Klingenberg Bradley M Applied Mathematics S Boulder

Lou Kit Chun (Alice) F Columbia College Mathematics S

U of Tennessee- Martin Elizabeth F Mathematics S Knoxville

Olorode Oluleye M Benedict College Electrical Engineering S

Oluwafemi Temitope M Benedict College Electrical Engineering S

New Jersey Institute Pallotta Michael M Mathematical Science S of Technology

Pedings Kathryn F College of Charleston Mathematics S

U of Colorado- Peterson Katherine F Applied Mathematics S Boulder Massachusetts Electrical Engineering Rao Raj M Institute of and Computer A Technology Science Pennsylvania State U Richards Donald M Statistics A and SAMSI SAMSI and North Selee Teresa F Mathematics A Carolina State U SAMSI and North Sharma Dhruv M Statistics A Carolina State U Loyola College- Shuman Christa F Mathematics S Baltimore SAMSI and North Center of Research in Smith Ralph M A Carolina State U Scientific Computing

Spiller Elaine F SAMSI and Duke U Mathematics A

Hunter College- Mathematics and Talih Makram M A CUNY and SAMSI Statistics

328 Undergraduate Two-Day Workshop Participant Summary March 2-3, 2007

New Number of Number of Unspec- Faculty/ Researcher/ Home Home Participants Male Female ified Professional Student Stat Math Other Institution State

Supported 13 12 0 0 25 2 19 2 13 9 Unsuppted 2 0 0 1 1 1 0 1 5 1 SAMSI 3 1 0 2 3 1 2 1 NA NA

Undergraduate Two-Day Workshop Workshop Participants March 2-3, 2007

Last Name First Name Gender Affiliation Major/Department Status

Badwey Becky F U of Kansas Mathematics S

Bajulaiye Olaniyi M Benedict College Computer Science S

North Carolina State Cintron-Arias Ariel M Mathematics A U and SAMSI

Dayringer Evan M Michigan State U Mathematics S

Debes Rachel F U of Kansas Mathematics S

Douglas Emmeline F College of Charleston Mathematics S

Virginia Polytechnic Edmonds Cory M Statistics S and State U North Carolina A&T Applied Mathematics Edwards Angela F S State U and Economics

Fanelli Theresa F Pennsylvania State U Biostatistics S

Fapohunda Adenrele M Benedict College Electrical Engineering S

Fry Brendan M U of Arizona Mathematics S

Mathematics and Howe Sean M U of Arizona S Creative Writing

Johnson Callie F James Madison U Mathematics S

329 Kaufmann Cari F SAMSI Statistics A

Indiana U- Mathematics and Koning Kristen F S Bloomington Psychology Indiana U- Economics, Math, Malament Laura F S Bloomington History Environmental and McMahon Sean M Duke U A Earth Sciences

Olorode Oluleye M Benedict College Electrical Engineering S

Mathematics, Pedings Kathryn F College of Charleston S Secondary Education EAWAG/ETH and Reichert Peter M A SAMSI Mathematics and Ringer Laura F Mississippi State U S Psychology

Robert Michael M Mississippi State U Mathematics S

Schwartz Catherine F James Madison U Mathematics S

Virginia Computer Science Schwartz Nathaniel M S Commonwealth U and Mathematics Mathematics and Seth Harshil M U of Arizona S Finance North Carolina State Smith Ralph M CRSC A U and SAMSI

Sopeju Aduramigba M Benedict College Electrical Engineering S

U of North Carolina- Traud Amanda F Applied Mathematics S Chapel Hill Mathematics and Wilkinson Darren M U of Newcastle A Statistics Indiana U- Williams Chester M Mathematics S Bloomington

Winward Steve M James Madison U Mathematics S

330 ™ Co-Sponsored Meetings

T-O-Y Workshop: Geophysical Models (at NCAR) Participant Summary November 13-14, 2006

New Number of Number of Unspec- Faculty/ Researcher/ Home Home Participants Male Female ified Professional Student Stat Math Other Institution State

Supported 0 0 0 0 0 0 0 0 0 0 Unsuppted 33 7 2 31 11 10 4 22 19 9 SAMSI 2 2 0 2 2 2 2 0 NA NA

T-O-Y Workshop: Geophysical Models (at NCAR) Workshop Participants November 13-14, 2006

Last Name First Name Gender Affiliation Department Status

Anderson Jeff M NCAR IMAGe FP

U of Valencia and Statistics and Bayarri M.J. F FP SAMSI Operations Research

Berliner Mark M Ohio State U Statistics FP

Statistics and Bingham Derek M Simon Fraser U FP Actuarial Science

Bonan Gordon M NCAR Terrestrial Sciences FP

North Carolina State Bondell Howard M Statistics NRG U National Challenor Peter M U of Southampton Oceanography FP Centre North Carolina State Chen Tsueilong Statistics NRG U

Collins Nancy F NCAR IMAGe FP

Cooley Dan M NCAR IMAGe NRG

331

Mathematics and Drignei Dorin M Oakland U FP Statistics

Duane Greg M NCAR IMAGe FP

Feddema Johannes M U of Kansas Geography FP

Fournier Aime M NCAR IMAGe FP

Franzke Christian M NCAR IMAGe NRG

North Carolina State Fuentes Montserrat F Statistics FP U

Furrer Eva F NCAR IMAGe NRG

Colorado School of Mathematical and Furrer Reinhard M NRG Mines Computer Sciences Los Alamos National Gattiker James M FP Laboratory Georgia Institute of Guillas Serge M Mathematics FP Technology Research Application Hacker Joshua M NCAR FP Laboratory Los Alamos National Higdon David M FP Laboratory

Hoar Tim M NCAR IMAGe FP

Kaufman Cari F SAMSI and NCAR IMAGe NRG

Kyungpook National Kim Sang Dong Mathematics FP U Swiss Institute of Kuensch Hans M FP Technology

Li Bo F NCAR IMAGe NRG

High Altitude Liu Hanli M NCAR NRG Observatory SAMSI and Wichita Mathematics and Ma Chunsheng M FP State U Statistics

Malmberg Anders M NCAR IMAGe FP

332

Queensland U of Mengersen Kerrie F FP Technology

Mininni Pablo M NCAR IMAGe FP

Nychka Doug M NCAR IMAGe FP

Pouquet Annick F NCAR IMAGe FP

Raeder Kevin M NCAR IMAGe NRG

Reese Shane M Brigham Young U Statistics FP

High Altitude Richmond Art M NCAR FP Observatory Computational Said Yasmin F George Mason U NRG Sciences

Sain Steve M NCAR IMAGe FP

London School of Smith Leonard M FP Economics U of North Carolina- Statistics and Smith Richard M FP Chapel Hill Operations Research

Tomassini Lorenzo M EAWAG-SIAM FP

SAMSI and U of Vernieres Guillaume M North Carolina- Mathematics NRG Chapel Hill Applied and Wegman Ed M George Mason U FP Engineering Statistics High Altitude Wiltberger Mike M NCAR NRG Observatory Statistics and Wolpert Robert M Duke U FP Decision Sciences

333

APPENDIX D – Workshop Programs and Abstracts

I. ASTROSTATISTICS

A. Statistical Challenges in Modern Astronomy IV Jun 12-15, 2006

Monday June 12 7:30 - 9:00 Registration, Poster Setup, Continental Breakfast

9:00 - 9:20 Daniel Larson, G. Jogesh Babu, Eric Feigelson (Penn State) Greetings Cosmology I Chair: William Jefferys (Texas/Vermont)

9:20 - 10:20 Gary Hinshaw (NASA/GSFC) The Cosmic Microwave Background, WMAP and Inflation: Is ns<1?

10:20 - 10:40 Short Break

10:40 - 11:20 Istvan Szapudi (Hawaii) Spatial statistics in the cosmic microwave background maps

11:20 - 12:00 Christopher Genovese (Carnegie Mellon) & Laura Cayon (Purdue) Cosmic Microwave Background Commentaries & Discussion (Genovese, Cayon)

12:00 - 1:20 Lunch Break Cosmology II Chair: Alanna Connors (Eureka)

1:20 - 2:00 Vicent Martinez (Valencia) Cosmic Structures: A Challenge for Astrostatistics

2:00 - 2:40 Adrian Baddeley (W Australia) Validation of Models for Spatial Point Patterns

2:40 - 3:10 Ji Meng Loh (Columbia) & Ofer Lahav (UC London) Large Scale Structure Commentaries & Discussion (Loh.pdf, Lahav.ppt)

3:10 - 3:25 Short Break

3:25 - 4:05 Christopher Kochanek (Ohio State) Turning AGN Microlensing From a Curiosity Into a Tool

4:05 - 4:45 Gary Bernstein (Penn) Statistical Challenges of Weak Gravitational Lensing

4:45 - 5:05 Jiayang Sun (Case Western Reserve) 353

Gravitational Lensing Commentary & Discussion

6:30 - 9:00 Reception & Poster Viewing

Tuesday June 13 7:30 - 8:30 Continental Breakfast Small-N Problems Chair: Keith Arnaud (NASA/GSFC)

8:30 - 9:10 Glen Cowan (Royal Holloway UL) The Small-N Problem in High Energy Physics

9:10 - 9:50 Harrison Prosper (Florida State) Bayesian Methods in Particle Physics: from Small-N to Large

9:50 - 10:20 Michael Woodroofe (Michigan) Commentary & Discussion On Small-N In Particle Physics

10:20 - 10:35 Short Break

10:35 - 11:15 Alanna Connors (Eureka) How to Win with Poisson Data: Imaging Gamma-Rays

11:15 - 11:35 David van Dyk (UC Irvine) Commentary & discussion of small-N in astronomy Chair: G. Jogesh Babu (Penn State)

11:35 - 12:05 "Two minute madness" Contributed Paper Presentations

12:05 - 1:30 Lunch Break Astronomical Surveys Chair: Donald Schneider (Penn State)

1:30 - 2:10 Thomas Loredo (Cornell) Analyzing Data from Astronomical Surveys: Issues and Directions

2:10 - 2:30 Woncheol Jang (Duke) Commentary & Discussion On Survey Methodology

2:30 - 3:10 Timothy Axelrod (Arizona) Photometric Calibration: an Intriguing Statistical Problem from the Large Synoptic Survey Telescope

3:10 - 3:30 J. K. Ghosh (Purdue) Commentary & Discussion of Photometric Calibration Problem

3:30 - 3:45 Short Break Chair: Jasper Wall (British Columbia)

354

3:45 - 4:25 Robert Lupton (Princeton) The Characterisation, Subtraction, And Addition Of Astronomical Images

4:25 - 4:45 Rebecca Willett (Duke) Commentary & Discussion of Image Analysis

4:45 - 5:25 Robert Hanisch (STScI/NVO) The Virtual Observatory: Core Capabilities and Support for Statistical Analyses in Astronomy

5:25 - 5:45 G. Jogesh Babu (Penn State) Commentary & Discussion of the Virtual Observatory

7:30 - 9:30 Poster Viewing & Software Demonstrations Thomas Loredo (Cornell) & Alanna Connors (Eureka) Python Inference Package Demonstration Michael Yukish (Penn State) & Tin Kam Ho (Bell Labs) Data Visualization Software Demonstrations

Wednesday June 14 7:30 - 8:30 Continental Breakfast Planetary Systems Chair: Phil Gregory (British Columbia)

8:30 - 9:10 Eric Ford (UC Berkeley) Bayesian Model Selection And Extrasolar Planet Detection

9:10 - 9:30 J. K. Ghosh (Purdue) Commentary & Discussion Of Extrasolar Planet Detection

9:30 - 10:00 William Romanishin (Oklahoma) Statistics of Optical Colors of Kuiper Belt Objects and Centaurs

10:00 - 10:20 Zhengyuan Zhu (N Carolina) Commentary & Discussion Of Solar System Minor Bodies

10:20 - 10:35 Short Break

10:35 - 11:15 Merlise Clyde (Duke) Current Challenges In Bayesian Model Choice

11:15 - 11:35 William Jefferys (Texas/Vermont) Commentary & Discussion Of Bayesian Model Selection (PDF) Chair: G. Jogesh Babu (Penn State)

11:35 - 12:05 "Two Minute Madness" Contributed Paper presentations

12:05 - 1:30 Lunch Break Developments In Statistics

355

Chair: J. K. Ghosh (Purdue)

1:30 - 2:10 Christopher Genovese (Carnegie Mellon) Nonparametric Inference and the Dark Energy Equation of State

2:10 - 2:30 Eric Feigelson (Penn State) Commentary & Discussion Of Nonparametric Inference

2:30 - 3:10 Rebecca Willett (Duke) Multiscale Analysis Of Photon-Limited Astronomical Images

3:10 - 3:30 Jeffrey Scargle (NASA-Ames) Commentary & Discussion Of Photon-Limited Problems

3:30 - 3:45 Short Break

3:45 - 4:15 Michael Woodroofe (Michigan) Non-Parametric Estimation Of Dark Matter Distributions In Dwarf Speroidal Galaxies

4:15 - 4:30 Martin Hendry (Glasgow) Commentary & Discussion On Dark Matter Mapping

4:30 - 5:30 Poster Viewing And Software Demonstrations David Hunter (Penn State) & Adrian Baddeley (W Australia) R Software Demonstration

7:00 - 8:15 Banquet

8:15 - 9:00 Alfred Inselberg (Tel Aviv/SDSC) Multidimensional Visualization And Its Applications

Thursday June 15 8:00 - 9:00 Continental Breakfast Periodic Variability Chair: James Rosenberger (Penn State)

9:00 - 9:40 John Rice (UC Berkeley) Detecting Periodicity In A Poisson Process

9:40 - 10:00 Jeffrey Scargle (NASA-Ames) Commentary & Discussion Of Poisson Periodicities

10:00 - 10:40 Chris Koen (W Cape) Periodicities In Variable Astronomical Objects

10:40 - 10:55 Short Break

10:55 - 11:35 Graham Woan (Glasgow)

356

Periodicity In Gravitational Waves

11:35 - 12:00 John Rice (UC Berkeley) Commentary & Discussion Of Periodicity In Astronomy

12:00 - 1:20 Lunch Break Cross-Disciplinary Perspectives Chair: C. R. Rao (Penn State)

1:20 - 2:00 Louis Lyons (Oxford) Physics Perspective

2:00 - 2:40 James Berger (Duke) Statistics Perspective (SAMSI Movie)

2:40 - 3:20 Ofer Lahav (UC London) Astronomy Perspective

B. Astrostatistics Transition Workshop July 15-20, 2006

Saturday July 15, 2006 16:00 Check-in begins (Front Desk - Professional Development Centre - open 24 hours) 17:30–19:30 Buffet Dinner, Donald Cameron Hall 20:00 Informal gathering in 2nd floor lounge, Corbett Hall Beverages and small assortment of snacks available on a cash honour- system.

Sunday PLENARY 7:00–8:45 Breakfast 8:45–9:00 Introduction and Welcome to BIRS by BIRS Station Manager 9:00–9:15 Nancy Reid: Introductory Remarks 9:15–9:45 Louis Lyons: Brief introduction to Particle Physics and typical statistical analyses 9:45–10:15 Jim Linneman: Monte Carlo experiments in Physics 10:15–10:45 Coffee Break, 2nd floor lounge, Corbett Hall 10:45–11:15 Byron Roe: Setting the scene for multivariate signal/background separation 11:15–11:45 Radford Neal: Statistician’s view of the above 12:00–13:30 Lunch 13:30–14:00 Questions and Discussion 14:00–14:30 Joel Heinrich: Setting the scene for limits and nuisance parameters 14:30–15:30 Luc Demortier: Setting the scene for p-values, including nuisance parameters 15:00–15:30 David van Dyk: Statistician’s view of above 15:30–16:00 Tea, 2nd floor lounge, Corbett Hall 16:00–16:30 Xiaoli Meng: Dealing with Nuisances: Principled and Ad Hoc Methods

357

16:30–17:00 Discussion and Questions 17:00–17:30 Organization of Working Groups and Planning for Tuesday 17:30–19:30 Dinner

Monday WORKING GROUP DAY; group photo after lunch 7:00–8:30 Breakfast 8:30–17:30 Working Groups 10:15–10:45 Coffee Break, 2nd floor lounge, Corbett Hall 11:30–13:30 Lunch 13:00–13:45 Optional Guided Tour of The Banff Centre; meet in the 2nd floor lounge, Corbett Hall 14:00–14:15 Group Photo; meet on the front steps of Corbett Hall 15:30–16:00 Tea, 2nd floor lounge, Corbett Hall 17:30–19:30 Dinner

Tuesday PLENARY MORNING, free afternoon 7:00–8:30 Breakfast 8:30–10:30 Progress Reports from working groups 10:30–11:00 Coffee Break, 2nd floor lounge, Corbett Hall 11:30–13:30 Lunch 13:30–17:30 Free Afternoon: hike/gondola to Sulphur Mountain 17:30–19:30 Dinner

Wednesday WORKING GROUPS DAY 7:00–8:30 Breakfast 8:30–15:30 Working Groups 10:30–11:00 Coffee Break, 2nd floor lounge, Corbett Hall 11:30–13:30 Lunch 13:30–17:30 Working Groups 15:30–16:00 Tea, 2nd floor lounge, Corbett Hall 17:30–19:30 Dinner 19:30–21:00 Begin reports from working groups

Thursday PLENARY: Conclusions and next steps 7:00–8:30 Breakfast 8:30–11:30 Draft outline for report; one representative from each working group to speak 10:00–10:30 Coffee Break, 2nd floor lounge, Corbett Hall 11:30–13:30 Lunch

II. EDUCATION & OUTREACH (2005-06)

A. SAMSI/CRSC Interdisciplinary Workshop For Undergraduates May 22-26, 2006

5/21 Sunday 7:00 Welcoming Reception in Sullivan Hall

358

5/22 Monday 8:30 Meet at Sullivan Hall. Transport to SAMSI. 9:10 Intro. to SAMSI. Presentations follow. 9:15 National Defense and Homeland Security A. Govan 10:00 Financial Mathematics Dr. J. Rodriguez & D. Vestal 10:45 Break 11:00 Astrostatistics F. Bullard 11:45 Lunch at SAMSI 12:30 Vans transport participants to Harrelson Hall. 1:15 Intro. and Background Dr. R. C. Smith 1:45 Intro. to the Forward Problem: Solving the Harmonic Oscillator System(.ppt version available) Dr. S. Dediu 2:45 Break 3:00 Brief Introduction to the Computing System and MATLAB D. Vestal 4:30 Vans take participants to Lake Crabtree 5:00 Dinner at Lake Crabtree

5/23 Tuesday 9:00 Linear Inverse Problems: A MATLAB Tutorial J. Sloan 10:45 Break 11:00 Basic Statistical Concepts and Some Probability Essentials S. Heyward 12:15 Lunch 1:15 Introduction to Statistical Inference F. Bullard MATLAB Inference Example 1 (inference_example_1.m) MATLAB Inference Example 2 (inference_example_2.m) MATLAB Inference Example 3 (inference_example_3.m) MATLAB Inference Example 4 (inference_example_4.m) 2:45 Break 3:15 Statistical View of Linear Least Squares: A MATLAB Tutorial. J. Ghosh

5/24 Wednesday 8:45 Transport from Sullivan Hall to Centennial Campus 9:00 Rotating Sessions Vibrating beam Data Collection at CRSC Lab J. David, Dr. R.C. Smith Graduate School Panel Dr. Kim Weems, Statistics Department, NCSU Dr. Ernie Stitzinger, Mathematics Department, NCSU Career Panel

359

Dr. Kevin Anstrom, Duke Clinical Research Institute Dr. Emily Lada, SAS Dr. Laura Potter, GlaxoSmithKline 12:00 Lunch 1:00 Reflection on the Data Collection and Modeling Experiences Drs. L. Denogean & M. Pemy Why Do We Need A Mathematical Model? Reflection and Discussion Presentation Reflection and Discussion Handout 2:15 Break 2:30 Solving the Vibrating Beam: Inverse Problem A. Sinko 3:00 Solving the Vibrating Beam: Optimization A. Govan 4:00 Teams Work on Inverse Problem

5/25 Thursday 9:00 Statistical Analysis for the Vibrating Beam Inverse Problem S. H. Kim & L. Zhang 10:00 Break 10:15 Alternative Beam Model Dr. R. C. Smith 11:15 Teams Work on Inverse Problem 12:30 Lunch 1:30 What could we do better? Alternative models/statistical methods Drs. L. Denogean & C. Storlie 2:30 Break 3:00 Teams Work on Inverse Problem; Begin to Prepare Reports 5:00 Participants Return to Sullivan Hall; Get Dinner 6:30 Bowling

5/26 Friday 9:00 Presentations and Discussion 10:30 Break 10:45 Presentations and Discussion 11:45 Closing Remarks & Workshop Evaluation Dr. R. C. Smith 12:00 Lunch 1:00 Participants Depart for Home

B. Twelfth Conference for African American Researchers in the Mathematical Sciences (CAARMS) June 20-23, 2006

Tuesday – June 20, 2006 5:30-8:30 AM Welcoming Reception: The Carolina Inn 211 Pittsboro Street Chapel Hill, NC 27516

360

(919) 933-2001

(Workshop Registration will be available in the lobby from 3:00-6:00 PM)

Wednesday – June 21, 2006 Phillips Hall, UNC Campus Room 215 (unless otherwise noted)

8:00-8:30 AM Breakfast and Registration – Phillips Room 330

8:30-9:00 AM Welcome and Introduction

9:00-10:00 AM “Gravity, Light, and Mathematics” Arlie Petters, Duke University

10:00-10:30 AM Coffee Break – Phillips Room 330

10:30-11:30 AM “An Instrumental Variable Approach to Estimation in Logistic Regression Measurement Error Models” Kimberly Weems, North Carolina State University

11:30-12:30 PM Lunch – Phillips Room 330

12:30-1:05 PM Announcements About Conferences, Events, Institutes and Workshops

1:05-1:15 PM Break

1:15-2:00 PM SAMSI Presentation Chris Jones, University of North Carolina, Chapel Hill

2:00-2:30 PM Group Photo

2:30-3:30 PM “Solitary Waves in Discrete Media and Four Wave Mixing Products” Rudy Horne, Florida State University

3:30-4:00 PM Coffee Break – Phillips Room 330 4:00-5:00 PM “User-Friendly Numerical Integration of Dynamical Systems: An HIV Pathogenesis Model” Dominic Clemence, North Carolina Agricultural and Technical State University

5:00-5:30 PM Break

5:30-7:30 PM Graduate Student Poster Session and Reception – Phillips Room 330

Thursday – June 22, 2006 Phillips Hall, UNC Campus

361

Room 215 (unless otherwise noted)

8:30-9:30 AM Continental Breakfast – Phillips Room 330

9:30-10:30AM “Dynamics of DNA in and Around a Nanopore” Charles Hagwood, National Institute of Standards and Technology

10:30-11:00 AM Coffee Break – Phillips Room 330

11:00-12:00 PM “Going for Growth: A Mathematical Economic Perspective” Ethelbert N. Chukwu, North Carolina State University

12:00-1:15 PM Lunch –Phillips Room 330

1:15-2:00 PM Tutorial: “The Ocean as a Dynamical System” Chris Jones, University of North Carolina, Chapel Hill and SAMSI

2:00-2:15 PM Break

2:15-3:15 PM “Modeling and Analysis of Social Networks” Jeffrey Forbes, Duke University

3:15-3:45 PM Coffee Break – Phillips Room 330

3:45-4:45 PM “In Search of a Good Structure: Methodology and Practices for Protein Structure Prediction” Gelonia Dent, North Carolina Agricultural and Technical State University

6:00-7:00 PM Reception - The Carolina Inn

7:00-9:00 PM Banquet – The Carolina Inn Keynote Address: “Mathematics is Global and Africa is an Integral Part of the Globe” Johnny Houston, Elizabeth City State University

Friday – June 23, 2006 Phillips Hall, UNC Campus Room 332 (unless otherwise noted)

8:30-9:00 AM Continental Breakfast- Phillips Room 330

9:00-10:00 AM “On the Classification of k-involutions of SP(n,k)” Farrah Jackson Chandler, University of North Carolina, Wilmington

10:00-10:30 AM Coffee Break- Phillips Room 330

362

10:30-11:30 AM “Nurse-to-Patient Ratios and Bed Capacity in Hospitals: A Queueing Perspective” Otis Jennings, Duke University

11:30-11:45 AM Walk to The Carolina Inn

11:45 AM Carolina Livery Bus Departs for SAMSI

SAMSI NISS Bldg., 19 Alexander Dr., RTP Room 104

12:30-2:15 PM Lunch

2:15-3:15 PM Presentation

3:15-3:45 PM Coffee Break

3:45-5:00 PM Panel Discussion

5:00 PM End of Conference Bus Returns to The Carolina Inn

Poster Abstracts

Chase Adams III – Howard University [email protected]

“Largeness of the Set of Finite Sums of Sequences in N”

In 1974, Neil Hindman established that whenever N is partitioned into finitely many cells, one of these must contain FS(n∈N)={ Σt∈F xt : F ∈ Pf(N)}, where Pf(N) is the set of finite subsets of N. Subsequently, a proof of the Finite Sums Theorem involving the algebraic structure of βN, the Stone-Čech compactification of N, was provided by Fred Galvin and Steven Glazer. It established an intimate relationship between finite sums in N and idempotents in βN. In (N,+), a set A is syndetic if and only if it has bounded gaps and a set is piecewise syndetic if and only if there exist a fixed bound b and arbitrarily long intervals in which the gaps of A are bounded by b. We say a set A ⊆ N is central if and only if there is some idempotent in the smallest ideal of βN that contains A. Under the direction of Neil Hindman, I used combinatorial and algebraic methods to investigate when FS(n∈N) is piecewise syndetic, syndetic, or central.

Onobu Akogwu – Princeton University [email protected]

“Instabilities and Pattern Formation in Crystal Growth”

363

Several common modes of crystal growth provide particularly simple and elegant examples of spontaneous pattern formation in nature. Phenomena of interest are those in which an advancing solidification front suffers instability and subsequently reorganizes itself into a more complex mode of behavior. The systems examined here are those in which solidification is controlled entirely by single diffusion process either by flow of latent heat away from a moving interface or the analogous redistribution of chemical constituents. Convective effects and crystalline anisotropy are ignored. The Mullins-Sekerka linear stability analysis is used to analyze simple planar, spherical interfaces and special case of directional solidification. These techniques can be extended to freely growing dendrites, which leads to an understanding of side-branching and tip- splitting instabilities, which occurs naturally in snowflakes.

Caleb Ashley – Howard University [email protected]

“A Classical Case of Knots in R3: A Theorem Attributed to Gauss”

In a Calculus 3 course, many integral theorems are given which are related to physical phenomena, for example Gauss' Divergence Theorem. After one year in Algebraic Topology, I have encountered a variety of new characters which are tools for generalizing calculus to higher dimensions. This poster presentation highlights the results of the classical integral equation of Gauss with a corresponding theorem which is stated and proved using characters and techniques of algebraic topology. Ultimately, the linking number of two manifolds is expressed by Gauss' Integral.

Yakira Braden – University of Wisconsin-Madison [email protected]

“The Threat of Predatory Generation Control to the U.S. Electric Power Infrastructure”

As structural changes for electric power in the U.S. and world-wide occur, market competition heightens for the provision of generation services. To increase market share, an unethical energy company owning generation might be motivated to exercise “predatory control,” to undermine perceived reliability of existing competitors. Perhaps more seriously, a malicious intruder, “hacking” controller software, could induce widespread blackouts via predatory control. Synchronous generators connected in an AC network form a single dynamical system with strong coupling across considerable geographic distances. This natural coupling between synchronous machines offers an opportunity for predatory control to take place. Predatory control occurs when a group of generators destabilizes competing machines in the network while minimizing the impact of the unstable mode within the group exercising control and maintaining nearly satisfactory performance.

Kareem Carr – Illinois State University [email protected]

“Resonance in a Driven, Relativistic Harmonic Oscillator”

In many undergraduate texts, the phenomenon of resonance is explored for a harmonic oscillator in the non-relativistic case. The physically impossible conclusion of the nonrelativistic analysis is that as the frequency of the driving force approaches the so-called natural frequency of a

364 harmonic oscillator, the amplitude of its oscillations approach infinity. To our knowledge, the relativistic case has not been studied. For this case, we have shown, using numerical integration, that: 1. Resonance occurs below the natural frequency of the system. 2. There is a region of bistability that does not occur in the non-relativistic case. 3. The maximum amplitude does not approach infinity at resonance. By applying the method of multiple scales for nonlinear oscillators, and other techniques, we approximate the resonance function for this phenomenon and explore the underpinnings of the novel resonance behaviors introduced by the relativistic nonlinearity.

Abdoul Kane – University of Toronto [email protected]

“Propagating Bursts in a Model of the Subthalamo-Pallidal Loop”

We use a biophysically constrained model to study the propagation of waves in the subthalamo- pallidal network, a neuronal pacemaker involved in the planning and control of movement. Within that network the membrane potential of subthalamic and pallidal cells display activity over many time scales. This makes it difficult to carry a straightforward computational or analytical investigation. By combining perturbative and asymptotic methods we describe a general technique for reducing such complex network equations to simpler yet biophysically relevant systems. We then use the reduced model to formulate conditions for structured self sustained propagation and compute the functional dependence of the propagation speed on network parameters.

Jon Middleton–State University of New York at Buffalo [email protected]

“On Rational Embeddings of Countable Ordinals”

It is possible to embed countable ordinals into the set of rational numbers with the usual order. We prove the existence of such embeddings and exhibit various methods of explicitly representing ordinal numbers via an embedding.

Irene Moshesh – Howard University [email protected]

“Image Partition Regularity of Affine Transformations”

In his famous 1933 paper, Richard Rado studied partition regularity of systems of linear equations. That is, given a system of equations a1,1 x1 + a1,2 x2 + . . . + a1,v xv = b1 a2,1 x1 + a2,2 x2 + . . . + a2,v xv = b2 au,1 x1 + au,2 x2 + . . . + au,v xv = bu and given a finite partition of the set N of positive integers, could one guarantee a solution set {x1, x2, . . . , xv} contained in one cell of the partition? In coloring terminology, one is asking whether, whenever N is finitely colored, must there be a monochromatic solution set. In the case where the system of equations is homogeneous, in other words when b = 0, Richard Rado gave the following definition:

365

Definition: Let u, v ∈ N and let A be a u × v matrix with entries from Q. Let S be one of N, Z, or Q. Then A is kernel partition regular over S ( KPR/S ) if and only if, whenever S \{0} is finitely colored, there must exist monochromatic x ∈ Sv such that Ax = 0. Rado was able to establish precisely when a u×v matrix A with entries in Q was kernel partition regular. This characterization was of importance since one could use kernel partition regularity to prove Schur’s Theorem published in 1916, which stated that whenever N is finitely colored there must exist monochromatic x, y, and x + y for example. However, kernel partition regularity fails to establish van der Waerden’s Theorem which states: Whenever N is finitely colored and ℓ ∈ N is given, there exist a and d such that a, a + d, a + 2d, . . . , a + ℓ·d are all the same color. In order to prove van der Waerden’s Theorem using kernel partition regularity, one must require that the increment d also be the same color. Neil Hindman and Imre Leader used image partition regularity, a notion introduced by Deuber, to establish van der Waerden’s Theorem without requiring the increment d to be the same color. Image partition regularity is a nice alternative to kernel partition regularity because it guarantees nonconstant solutions, is computable and offers a canonical matrix representation. In 1993, Neil Hindman and Imre Leader characterized when a u × v matrix A with entries from Q is image partition regular over N. Just as Rado considered the kernel partition regularity of the affine transformation x → A・x + b, one can ask about the image partition regularity of the same transformation. This matter will be the focus of this talk.

Wilfred Ndifon – Princeton University [email protected]

“Putative Self-Organized Criticality of RNA Molecules and RNA Viral Genomes”

The distribution of the number of structural rearrangements of an RNA molecule is shown to be log-normal over short time scales, and to converge asymptotically to a power-law, with a scaling exponent 0 < γ < 2, which is believed to be characteristic of self-organized critical systems. This suggests that RNA molecules and RNA viral genomes may exist in a self-organized critical state and predicts, among other things, a scale-invariant distribution of fluctuations in viral fitness during evolutionary stasis. Interestingly, this prediction is borne out by the results of virus evolution experiments. (This presentation is based on joint work with Dr. Santiago Elena of the Spanish National Research Council and Dr. Simon Levin of Princeton University).

Etienne Ogoubi – Université de Montréal [email protected]

“Near-Complete Decomposability Concept for On-Chip Multiprocessor Architecture”

Partitioning applications using Domain Decomposition (DD) for a multiprocessor architecture is seen as a powerful technique. Many areas of science have extensively and successfully used this technique to solve problems of great size and complexity. A part of our research tackles the Near - Complete Decomposability (NCD) concept, with its associated technique of aggregation of variables, to model and then solve problems based on a parallel on-chip multiprocessor architecture with reconfigurable devices. This approach is based on conjectures made from analogies between aggregates in nearly completely decomposable systems and computational sub-applications.

Olawale O. Oladehin – Princeton University

366

[email protected]

“Data Driven Strategic Gaming”

What attributes are required to play a game well? Of course all sports require talent, but exactly how do strategies and different approaches affect or benefit a player? For this project we separate the aspects of games into three different categories—facts, tactics, and strategy. Each player must know the facts and rules of the game, they must then apply those facts to create different tactics for a specific moment in the game, and then strategy becomes the culmination of facts and tactics that a player gains over time. This strategy is the attribute that the player will use to view the game in its entirety. Through the use of game theory, data mining, and simulation, we will approach a very popular card game — Texas Hold ‘em. The approach for this project will be to create a database of basic poker knowledge — pot odds, starting hand values, etc. This basic knowledge will then be linked to several components in order to create an artificial “short-term” and “long-term” memory for the poker player. The goal is to be able to train a computer program to learn and improve at the game of poker through learning and accumulating “knowledge.”

Derrick Raphael – Princeton University [email protected]

“A Collective Analysis of African American Researchers in the Mathematical Sciences”

For this project, I am conducting brief surveys of Black mathematicians attending CAARMS12 at UNC-Chapel Hill. I also plan to conduct extended interviews with various participants as well to get a more in-depth view of their path to become mathematicians. In addition to the gleaning of information from participants, I will incorporate various past studies on this issue and relevant studies to get a more informed view of these unique individuals. Overall this study seeks to find out what are the building blocks that help to produce Black mathematicians.

Charles Rogers – North Carolina State University [email protected]

“The Effect of Alcohol on Neuron Firing”

Neurons are responsible for transmitting messages throughout the body via long distance electrical signals known as action potentials. These depend on the active transport of sodium and potassium ions across the cell membrane. The effect of various drugs on the process of neuron firing is a current research interest. The Hodgkin-Huxley equations, a system of four nonlinear ordinary differential equations, mathematically model the influx and efflux of these ions across the cell membrane. In the presence of alcohol, the release of potassium ions is accelerated. We propose a modified version of these equations, which incorporates the effect of alcohol, and examine its implications through mathematical analysis in dynamical systems. We investigate the qualitative behavior and interpret the results of the steady-state solutions in the fast and fast- slow phase planes. This is joint work with Jeannine T. Abiva, Edna S. Joseph, and Arpy K. Mikaelian, with faculty mentors Erika T. Camacho and Stephen A. Wirkus.

Stephanie Somersille – University of California at Berkeley [email protected]

367

“Adjoints of Composition Operators on Hilbert Spaces”

Many seemingly basic questions regarding composition operators on Hilbert spaces remain open. One such problem is finding explicit formulas for their adjoints. We will consider a new class of functions, an extension of composition, weighted composition and Toeplitz operators to include operators derived from multiple valued functions. We will discuss their properties such as boundedness and compactness. We will give a formula for the adjoints of composition operators on Hardy Hilbert spaces whose symbols are rational functions as composits of such (weighted) composition operators and backward shift operators.

Luke J. Stewart – Duke University [email protected]

“Solving Numerically the Lens Equation in Stochastic Gravitational Lensing”

Solving analytically the lens equation in gravitational lensing tends to be impossible, especially as the parameters of the lensing system are allowed to become more complicated. Therefore, numerical methods are used to solve the lens equation for set of lensed images that are produced for a given light source and lensing parameters. The goal of my project is to determine the mean and variance of the number of lensed images produced when the parameters are varied uniformly, as in the case of stochastic gravitational lensing.

Alberto Mokak Teguia – Duke University [email protected]

“A Sierpiński Graph and Some of its Properties”

The Sierpiński fractal or Sierpiński gasket Σ is a familiar object studied by specialists in dynamical systems and probability. In this paper, we consider a graph Sn derived from the first n iterations of the process that leads to Σ and study some of its properties, including the cycle structure, domination number and pebbling number. Various open questions are posed. This is joint work with Anant P. Godbole.

Conrad Tucker – University of Illinois, Urbana-Champaign [email protected]

“Analytical Design of Product Family Cell Phone Architecture”

Product platform design enables companies to minimize manufacturing and design costs by allowing a family of products to be developed around shared, efficient product architecture. Under this product family design paradigm, this paper introduces a method to analytically determine the optimal product platform configuration in the multi-product hierarchy that will evolve in upcoming generations of product launch. By sharing the most common and labor intensive components, the cost of a product together with its predecessors can be drastically reduced, hereby appealing more to market demand. Multilevel, multidisciplinary optimization has become an effective alternative to solve complex, large-scale system design problems that are conventionally solved by all-in-one (AIO) approach. To link product design and product planning effectively, however, traditional single stage static formulation should be expanded to

368

multistage formulation to model the changing product specifications and market demand. In this article, a product family design model that optimizes the overall enterprise objective spanning multiple stages of product launch is presented. At each stage, the hierarchical product family design and planning problem are modeled using analytical target cascading (ATC) and are further expanded to accommodate changing design variables and parameters for different stages. Multistage modeling allows for simultaneous and independent optimization of each individual stage system design while still maintaining the overall enterprise objective. The motivation of this research is to explain how the ATC methodology can be extended in a multistage setting to include the optimization of engineering designs for a product family in an extremely volatile and competitive market space. To illustrate the methodology, an automotive vehicle brake system design and a cellular phone family design will be presented.

Brian Williams – University of Mississippi [email protected]

“Large Circuit Pairs in Matroids”

Scott Smith conjectured in 1979 that two distinct longest cycles of a k connected graph meet in at least k vertices when k ≥ 2. This conjecture is known to be true for k ≤ 10. Only the case k ≤ 6 appears in the literature, however. Reid and Wu generalized Smith’s conjecture to k connected matroids by considering largest circuits. The case k = 2 of the matroid conjecture follows from a result of Seymour. In addition, McMurray, Reid, Sheppardson, Wei, and Wu established an extension of the matroid conjecture for k = 2 and proved it for co-graphic matroids when k ≤ 6. In his Ph.D. dissertation, McMurray established the matroid conjecture for matroids of circumference four. I establish Reid and Wu’s conjecture for several classes of matroids which include those that have connectivity three, circumference five, and spanning circuits. For this talk I will focus on Reid and Wu’s conjecture for k = 3.

Adrian Wilson – University of Mississippi [email protected]

“Graph Groupoids and Their Topology”

I investigate graph groupoids and the path spaces associated with their unit spaces. It was shown by Paterson and Welch that for a general directed graph E, the path space X which is of the form Y ∪ Z, where Y is the set of finite paths and Z the set of infinite paths in E, is a locally compact metric space. I solved three main questions. For the first, a natural question that was asked by A. Kumjian (University of Nevada at Reno) in the case of the Cuntz graph E∞ was how that topology relates to an earlier topology investigated by J. Renault (Orléans). I show that the two topologies are homeomorphic and so can be identified. I then discuss the graph groupoid for E in the general case. For this investigation, it is important to be able to use the axiomatic approach to groupoids, by showing that this is equivalent to the usual definition of a groupoid as a “small category with inverses”. The proof of this is given, and is the second main question answered by my dissertation. The third main question is to construct the graph groupoid G E for E and prove that it is a second countable, locally compact Hausdorff groupoid. The construction and proof of this are given in the last section of my dissertation. (The proof of the corresponding result for row finite graphs had been sketched earlier in the work of, e.g., Kumjian, Raeburn and Renault.)

Michael Young – Carnegie Mellon University 369 [email protected]

“Results in Ramsey Theory”

A typical result in Ramsey Theory states that if a given set of objects are partitioned into a finite number of partitions, then there exists some significant object in one of the partitions. The Pigeon Hole Principle is a trivial example. The Ramsey Number R(ℓ,m) is the smallest n such that the edges of the complete graph on n vertices (Kn) can be two colored (red and blue) and contain no blue Kℓ or red Km. R(k,k) is known for k < 5. I will be presenting a new method to find bounds on K3 + I3 and K4 + I4. Finding these bounds will prove to be useful in finding R (5,5).

III. DEVELOPMENT, ASSESSMENT AND UTILIZATION OF COMPLEX COMPUTER MODELS

A. Summer School on Design and Analysis of Computer Experiments The Irmacs Centre, Simon Fraser University August 11-16, 2006

Friday, August 11 – IRMACS Presentation Studio 8:00 - 9:00 Registration & IRMACS Centre Tour

8:15 - 9:00 Breakfast, Irmacs Atrium

9:00 - 9:30 Opening Remarks: Derek Bingham, Conference Coordinator James Berger, Director, SAMSI Pam Borghardt, Associate Director, The IRMACS Centre Brian Corrie, IRMACS Demo

9:30 - 10:00 Short course in computer experiments

10:00 - 10:15 Coffee Break, Irmacs Atrium

10:15 - 12:00 Short course in computer experiments

12:00 - 1:15 Lunch Break – Box Lunch Provided

1:15 - 2:30 Short course in computer experiments

2:30 - 2:45 Coffee Break, Irmacs Atrium

2:45 - 4:15 Short course in computer experiments

4:30 Reception, Irmacs Atrium

Saturday, August 12 – IRMACS Presentation Studio 8:30 - 9:00 Breakfast, Irmacs Atrium

370

9:00 - 10:00 Short course in computer experiments

10:00 - 10:15 Coffee Break, Irmacs Atrium

10:15 - 11:30 Short course in computer experiments

11:30 - 12:15 Computer Experiments at NCAR: Applications and Opportunities

12:45 Coach leaves SFU Bus Loop for excursion to Stanley Park

6:00 First coach returns to SFU from Stanley Park

9:30 Second Coach returns to SFU from Stanley Park

Sunday, August 13 – IRMACS Presentation Studio 8:30 - 9:00 Breakfast, Irmacs Atrium

9:00 - 10:30 Anthony O’Hagan, Building and using an emulator with GEM-SA Brian Williams, Los Alamos National Laboratory GPM: Software for Calibrating Computer Models to Experimental Data

10:30 -10:45 Coffee Break, Irmacs Atrium

10:45 - 11:30 Laura Swiler, Sandia National Laboratories The DAKOTA Toolkit and its use in Computational Experiments

11:30 - 12:00 Tom J. Santner, The Ohio State University A Tutorial on the PErK Program

12:00 - 1:30 Lunch Break

1:30 - 2:30 Jim Berger and Fei Lui, Duke University Rui Paulo, University of Bristol Jerry Sacks, NISS SAVE-1 and SAVE-2 for Computer Models

2:30 - 2:45 Coffee Break, Irmacs Atrium

2:45 - 3:45 Jim Berger and Fei Lui Duke University Rui Paulo, University of Bristol Jerry Sacks, NISS SAVE-1 and SAVE-2 for Computer Models

3:45 - 4:45 Tutorial on software for cosmology problem

Monday, August 14 – IRMACS Presentation Studio

371

8:30 - 9:00 Breakfast, Irmacs Atrium

9:00 - 10:30 Fei Liu, Duke University Simulator Analysis and Validation Engine 2 Matt Taddy, University of California, Santa Cruz Multi-Resolution Treed Gaussian Processes Elaine Spiller, University at Buffalo Rare Events in Nonlinear Lightwaves Systems

10:30 -10:45 Coffee Break, Irmacs Atrium

10:45 - 12:15 Pritam Ranjan, Simon Fraser University Sequential Experiment Design for Contour Estimation from Computer Simulators Yan Lan, University of Michigan A Two-Stage Procedure for Change Point Estimation Dianne Bautista, The Ohio State University Nonparametric Estimation of the Covariance Function of Stationary Gaussian Processes

12:15 - 1:45 Lunch Break

1:45 - 2:45 Gang Han, The Ohio State University Calibration and Prediction for Computer Experiment Output Having Qualitative and Quantitative Input Variables Jason Loeppky, UBC Okanagan Successful Calibration: A Practitioners Guide

2:45 - 3:00 Coffee Break, Irmacs Atrium

3:00 - 4:30 Bela Nagy, University of British Columbia Fast Bayesian Implementation (FBI) of Gaussian Process Regression Ying Hung, Georgia Tech. University Blind Kriging: A New Method for Developing Metamodels James D. Delaney, Georgia Tech. University "Functionally Induced Priors for the Analysis of Physical Experiments"

5:30 Conference Dinner At The Diamond Alumni Centre

Tuesday, August 15 8:30 - 9:00 Breakfast, Irmacs Atrium

9:00 - 9:15 Opening Remarks: Derek Bingham

9:15 - 10:45 Application in Cosmology

10:45 - 11:00 Coffee Break, Irmacs Atrium

372

11:00 - 12:30 Problem Solving

12:30 - 2:00 Lunch Break

2:00 - 3:30 Problem Solving

3:30 - 3:45 Coffee Break, Irmacs Atrium

3:45 - 5:15 Problem Solving

Wednesday, August 16 8:30 - 9:00 Breakfast, Irmacs Atrium

9:00 - 10:30 Problem Solving

10:30 - 10:45 Coffee Break, Irmacs Atrium

10:45 - 12:15 Problem Solving

12:15 - 1:45 Lunch Break

1:45 - 3:15 Presentations

3:15 - 3:30 Coffee Break, Irmacs Atrium

3:30 - 5:45 Presentations

5:45 Closing Remarks

Speaker Abstracts (in order of presentation) Sunday August 13, 9:00 – 10:30 AM

Anthony O’Hagan, University of Sheffield

Building and using an emulator with GEM-SA

GEM-SA is user-friendly, Windows-based software for building a Gaussian process emulator and carrying out uncertainty and sensitivity analyses. This talk will illustrate how to use the software to build and validate an emulator of a computer code, and how to interpret the diagnostics and analyses it produces.

Brian Williams, Los Alamos National Laboratory

GPM: Software for Calibrating Computer Models to Experimental Data

GPM (Gaussian Process Modeling) is software in MATLAB for calibrating computer models to experimental data using a version of the Kennedy and O'Hagan model. Univariate and multivariate outputs are accommodated. For multivariate outputs, users have the flexibility to establish basis representations for the code output and discrepancy model, that are suitable for

373

each individual application. GPM facilitates the use of information from multiple data sources to inform on a common set of parameters to be calibrated. In addition, optimization options are available to assist users in baselining their computer models. GPM offers users some basic sensitivity analysis tools for assessing model output sensitivity to the input parameters. Future directions for GPM include hierarchical modeling options for calibration parameters and discrepancy across separate physical experiments.

Sunday, August 13, 10:45 – 11:30 AM

Laura Swiler, Sandia National Laboratories

The DAKOTA Toolkit and its use in Computational Experiments

The DAKOTA toolkit (Design Analysis Kit for Optimization and Terascale Applications) provides a flexible, extensible interface between analysis codes and iterative system analysis methods. DAKOTA contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quantification with sampling, reliability, and stochastic finite element methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study capabilities. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed-integer nonlinear programming, or optimization under uncertainty. This talk will provide an overview of DAKOTA's capabilities with a focus on the uncertainty analysis and experimental design capabilities. Some examples will be presented.

Tom J. Santner, The Ohio State University

A tutorial on the PErK program

We will give an overview of the capabilities of the PErK (Parametric Empirical Kriging) software. Then we describe how to find and install it on Unix (Linux or Cygwin) systems. Finally we will run a series of PErK jobs that illustrate the program in action.

Sunday, August 13, 1:30 – 2:30 & 2:45 PM – 3:45 PM

Jim Berger and Fei Lui Duke University Rui Paulo, University of Bristol Jerry Sacks, NISS

SAVE-1 and SAVE-2 for Computer Models

The Simulator Analysis and Validation Engine (SAVE) is a set of research software applications that implements the validation framework for computer models of Bayarri et al. (2005) and subsequent extensions. In this presentation, we will describe the main ideas behind this validation strategy, the scope of applicability of each of the software applications, and some of the implementation details, including its use in the context of specific validation problems. We will start by describing the basic ideas of the general validation framework, which is comprised of six steps. Each of these steps may or may not involve computational work, and this exposition will emphasize the computational tasks rather than the methodological aspects of the framework. In essence, one has to deal with the construction of an approximation to the output of the

374

computer model, one has to estimate the unknown parameters in the statistical model relating reality and computer output, and one has to predict both reality and the output of the computer model in untried combinations of the inputs. The first module of the software bundle, SAVE-1, implements the strategy in the situation where the output of the computer model is a scalar, and there is potentially an uncertain parameter that takes the same value in all the field experiments. This is detailed precisely, and the utilization of the code is exemplified using a specific real- world problem, which will help describing the more technical details of the software. The second module, SAVE-2, deals with the situation where the output of the computer code is a highly irregular function of the inputs. The functional data is decomposed using wavelet representation techniques, and the validation strategy then proceeds by applying a hierarchical version of the scalar validation methodology to the wavelet coefficients, followed by transforming back to the functional data realm. The issue of uncertain controllable inputs in the field experiments is also tackled in this software. Again, technical details of the construction and utilization of the software are described using specific real-world applications.

Monday, August 14, 9:00 – 10:30 AM

Fei Liu, Duke University

Simulator Analysis and Validation Engine 2

A key question in evaluation of computer models is ``Does the computer model adequately represent reality?'' A six-step process for computer model validation is set out based on comparison of computer model runs with field data of the process being modeled. The methodology is particularly suited to treating the major issues associated with the validation process: quantifying multiple sources of error and uncertainty in computer models; combining multiple sources of information; and being able to adapt to different -- but related-- scenarios. Two complications that frequently arise in practice are the need to deal with highly irregular functional data and the need to acknowledge and incorporate uncertainty in the inputs. We develop methodology to deal with both complications. A key part of the approach utilizes a wavelet representation of the functional data, applies a hierarchical version of the scalar validation methodology to the wavelet coefficients, and transforms back, to ultimately compare computer model output with field output. The generality of the methodology is only limited by the capability of a combination of computational tools and the appropriateness of decompositions of the sort (wavelets) employed here.

Matt Taddy, University of California, Santa Cruz

Multi-Resolution Treed Gaussian Processes

Coupling Gaussian Processes with treed partitioning is an efficient way to model non-stationary behavior. I will discuss how this idea can be extended to deal with the common computer experiment situation where the data come from more than one model, and the models can be ordered in terms of fidelity. By partitioning over Gaussian Process models that incorporate multiple fidelity output, we maintain the structure and efficiency of the original TGP methodology. The methodology will be illustrated on example datasets.

Elaine Spiller, University at Buffalo

375

Rare Events in Nonlinear Lightwaves Systems

The nonlinear Schroedinger equation (NLS) with a periodic, varying dispersion coefficient models the dynamics of light in optical communication systems and mode-locked lasers. The dispersion-managed nonlinear Schroedinger equation (DMNLS) is an averaged version of NLS which restores some symmetries that are lost in NLS when the dispersion coefficient is not constant. I will discuss these symmetries, the corresponding conservation laws, and modes of the linearized DMNLS. I will also discuss how these linearized modes can be utilized to guide importance-sampled Monte-Carlo simulations of rare events in dispersion-managed lightwave systems subject to noise. This study is pertinent because the performance of lightwave systems is limited by the occurrence of rare events, i.e., noise-induced errors.

Monday, August 14, 10:45 – 12:15 PM

Pritam Ranjan, Simon Fraser University

Sequential Experiment Design for Contour Estimation from Computer Simulators

In many engineering applications, one is interested in identifying the inputs to a computer simulator that lead to a pre-specified output. In this talk we introduce statistical methodology that identifies the desired contour in the input space. The proposed approach has three main components. Firstly, a stochastic model is used to approximate the global response surface. The model is used as a surrogate for the underlying computer model and provides an estimate of the contour together with a measure of uncertainty, given the current set of computer trials. Then, a strategy for choosing subsequent computer experiments to improve the estimation of the contour is outlined. Finally, we discuss how the contour is extracted and represented. The methodology is illustrated with an example from a multi-class queuing system.

Yan Lan, University of Michigan

A Two-Stage Procedure for Change Point Estimation

Consider a constant regression model for a bounded covariate that has a single discontinuity (change point). It is assumed that one can sample the covariate at different values and measure the corresponding responses. Budget constraints dictate that total of n such measurements can be obtained. The goal is to estimate accurately the location of the change-point. A two-stage procedure is proposed and its properties examined, where at the first stage a proportion of the n points is sampled and the location of the change-point estimated. Subsequently, the remaining proportion of points are sampled from an appropriately chosen neighborhood of the initial estimate of the change point and a new estimate is obtained. The asymptotic distribution of the least squares estimate is derived using ideas from empirical processes. The improved efficiency of the procedure is demonstrated using real and synthetic data. The problem is motivated by problems in engineering systems, where the response corresponds to cost functionals and the covariate to stress or loading levels of the underlying system.

Dianne Bautista, The Ohio State University

Nonparametric Estimation of the Covariance Function of Stationary Gaussian Processes

376

The estimation of the covariance function is of interest in predicting the outcome of a computer experiment using the Emprical Best Linear Unbiased Predictor (EBLUP). A valid covariance function must be positive definite. To guarantee this, conventional parametric estimation arbitrarily assumes that the covariance function belongs to a certain family indexed by a parameter,θ, which is consequently estimated via maximum or penalized likelihood. To circumvent this arbitrariness, several non/semi-parametric approaches have been proposed. Four such estimators based on a single realization of a stationary Gaussian process are discussed. The data consist of (xi,yi), i=1,2,…,n, where xi d , d≥1, and y(xi). These estimators are those introduced by Shaprio and Botha (1991), Hall and Patil (1994), Ong et al. (2002), and Elogne et al. (2003). These methods are compared to the Restricted Maximum Likelihood (REML) estimation procedure and also to each other with respect to mean square predictive error.

Monday, August 14, 1:45 – 2:45 PM

Gang Han, The Ohio State University

Calibration and Prediction for Computer Experiment Output Having Qualitative and Quantitative Input Variables

We propose statistical models for prediction and calibration that allow both qualitative and quantitative input variables. The model allows prediction of a computer code at an untested set of qualitative and quantitative inputs as well as quantifying the uncertainty in the prediction. In the case of calibration, both the physical experiment and computer code are allowed to depend on both types of variables. A Bayesian Qualitative and Quantitative Variable (QQV) model is constructed and implemented by Markov Chain Monte Carlo methodology. This model is compared with a frequentist approach and a Bayesian independence model in several examples. This is joint work with Thomas Santner and William Notz.

Jason Loeppky, UBC Okanagan

Successful Calibration: A Practitioners Guide

Computer models to simulate physical phenomena are now widely available in engineering and science. Before relying on a computer model, a natural first step is often to compare its output with physical or field data, to assess whether the computer model reliably represents the real world. Field data, when available, can also be used to calibrate unknown parameters in the computer model. Calibration can be particularly problematic in the presence of systematic discrepancies between the computer model and field observations. In this talk we present results on a simulation study that is designed to assess how well the calibration parameter has been estimated, and the conditions under which calibration is possible. By simulating both computer model data, and physical observations from a Gaussian process the uncertainty due to using the incorrect model does not arise. This allows us a more accurate picture of the problems that can arise when attempting to calibrate the model in the presence of systematic discrepancy. Joint work with William Welch and Brian Williams

Monday, August 14, 3:00 – 4:30 PM

Bela Nagy, University of British Columbia

377

Fast Bayesian Implementation (FBI) of Gaussian Process Regression

The traditional prediction variance formula for Gaussian Process Regression (Kriging) underestimates the true uncertainty because it doesn't incorporate the variability due to estimating the model parameters. This leads to overly optimistic prediction bands about the predictor. We propose a computationally cheap Bayesian alternative in the absence of subjective prior distributions of the parameters. Simulations show that the resulting prediction bands have better frequentist properties (in terms of coverage probabilities) than the ones based on the traditional method. Joint work with Stella Karuri, Jason Loeppky, William J. Welch

Ying Hung, Georgia Tech. University

Blind Kriging: A New Method for Developing Metamodels

Kriging is a useful method for developing metamodels for product design optimization. The most popular kriging method, known as ordinary kriging, uses a constant mean in the model. In this article, a modified kriging method is proposed, which has an unknown mean model. Therefore it is called blind kriging. The unknown mean model is identified from experimental data using a Bayesian variable selection technique. Many examples are presented which show remarkable improvement in prediction using blind kriging over ordinary kriging. Moreover, blind kriging predictor is easer to interpret and seems to be more robust to misspecification in the correlation parameters. This is joint work with V. Roshan Joseph and Agus Sudjianto.

James D. Delaney, Georgia Tech. University

Functionally Induced Priors for the Analysis of Physical Experiments

Specifying a prior distribution for the large number of parameters in the linear statistical model is a difficult step in the Bayesian approach to the design and analysis of experiments. Here we address this difficulty by proposing the use of functional priors and then by working out important details for three and higher level experiments. One of the challenges presented by higher level experiments is that a factor can be either qualitative or quantitative. We propose appropriate correlation functions and coding schemes so that the prior distribution is simple and the results easily interpretable. The prior incorporates well known experimental design principles such as effect hierarchy and effect heredity, which helps to automatically resolve the aliasing problems experienced in fractional designs. (Joint work with Dr. V. Roshan Joseph.)

B. Development, Assessment and Utilization of Complex Computer Models Opening Workshop and Tutorials September 10-13, 2006

Sunday, September 10, 2006 Radisson Hotel RTP Room H, 3rd Floor

11:00-12:15 PM Brunch and Registration

378

12:15-12:30 PM Welcome Jim Berger, SAMSI

12:30-1:30 PM “Tutorial Lecture on Models of Granular Materials” Bruce Pitman, (University of Buffalo)

1:30-1:45 PM Break

1:45-3:30 PM Two Invited Talks on Models of Granular Materials: Martin Bazant, (Massachusetts Institute of Technology): ”Stochastic Plasticity: A Multi-scale Model for Granular Flow” Isaac Goldhirsch, (Tel Aviv University):”Some Open Problems in Granular Matter”

3:30-4:00 PM Break

4:00-5:30 PM New Researchers Session 1: Dorin Drignei, (National Center for Atmospheric Research): “Parameter Estimation for Computationally Intensive Nonlinear Regression with an Application to Climate Modeling” Wendy Parker, (Ohio University): ”How to Think About Models and Their Evaluation—a proposal” Gang Han, (Ohio State University): ”Analysis of Computer Experiment Output Having Qualitative and Quantitative Input Variables” Karen Daniels, (North Carolina State University): “Rates of Mixing and Segregation in Sheared Granular Materials”

Monday, September 11, 2006 Radisson Hotel RTP Room H, 3rd Floor

8:15-9:00 AM Continental Breakfast and Registration

9:00-10:00 AM Tutorial Lecture on Biological Models Darren Wilkinson, (University of Newcastle)

10:00-10:15 AM Break

10:15-Noon Two invited talks on Biological Models: Reinhard Laubenbacher, (Virginia Bioinformatics Institute): “Complex Models in Systems Biology” Mike West and Lingchong You, (Duke University): “Modeling Dynamic Cellular Networks”

Noon-1:15 PM Lunch

1:15-3:00 PM Two invited talks on Methodology: Laura Swiler, (Sandia National Laboratories): “Building

379

Credibility in Computational Simulations Through Verification” Max Morris, (Iowa State University): “Nonstationary Twists on Stationary Process Models”

3:00-3:30 PM Break

3:30-5:15 PM Panel Discussion on Methodology Susie Bayarri, (University of Valencia), moderator Michael Goldstein, (University of Durham) Tony O'Hagan, (University of Sheffield) Henry Wynn, (London School of Economics)

6:30-8:30 PM Poster Session and Reception Room AB, 2nd Floor (Poster presenters please arrive early - posters set-up by 6:15 pm)

Tuesday September 12, 2006 Radisson Hotel RTP Room H, 3rd Floor

8:15-9:00 AM Continental Breakfast and Registration

9:00-10:00 AM Tutorial Lecture on Engineering Models Dave Higdon, (Los Alamos National Laboratory)

10:00-10:15 AM Break 10:15-Noon Panel Discussion on Engineering Models Tom Santner, (Ohio State University), Moderator Don Bartel, (Cornell University), on Biomechanics Angela Patterson, (General Electric), on Aeronautical Applications Jerry Sacks, (National Institute of Statistical Sciences), on Statistics

Noon-1:15 PM Lunch

1:15-3:00 PM Two invited talks on Models of Granular Materials: Bob Behringer, (Duke University): “Statistical Properties of Dense Granular Materials” Peter J. Mucha, (University of North Carolina): “Particle-Based Animation of Granular Materials”

3:00-3:30 PM Break

3:30-4:45 PM Panel Discussion on Models of Granular Materials Sorin Mitran, (University of North Carolina), Moderator Luis Pericchi, (University of Puerto Rico) Bruce Pitman, (University of Buffalo)

380

4:45-5:30 PM New Researchers Session 2: Crystal Linkletter, (Simon Fraser University): “Reference Distribution Variable Selection for Gaussian Process Models” John Paul Gosling, (University of Sheffield): “Quantifying Uncertainty in the Biospheric Carbon Flux for England and Wales”

Wednesday, September 13, 2006 Radisson Hotel RTP Room H, 3rd Floor

8:30-9:00 AM Continental Breakfast and Registration

9:00-10:00 AM Tutorial Lecture on Environmental/Ecological Models Jim Clark, (Duke University)

10:00-10:15 AM Break

10:15-Noon Two invited talks on Environmental/Ecological Models: Jasper A. Vrugt, (Los Alamos National Laboratory): “Calibration and Uncertainty Assessment of Environmental Models: Methods and Applications” Nancy Nichols, (University of Reading): “Getting Started: Data Assimilation for Very Large Inverse Problems in Environmental Science”

Noon-1:15 PM Lunch by Subprogram to Identify Working Groups

1:15-3:00 PM Two invited talks on Climate Models: Bette L. Otto-Bliesner, (National Center for Atmospheric Research): “Studying Climate Change with Climate Models” Mark Berliner, (Ohio State University): “Bayesian Treatment of Computer Model Output as Data”

3:00-3:30 PM Break

3:30-4:45 PM Panel Discussion on Climate Models Montse Fuentes, (North Carolina State University) Jonathan Rougier, (University of Durham) Leonard Smith, (Oxford University)

4:45-5:30 PM New Researchers Session 3: Zhiguan Qian, (University of Wisconsin-Madison): “A Structural Equation Method for Temperature Modeling in Data Center Computer Experiment” Genetha Anne Gray, (Sandia National Laboratories): “Calibration, Validation, and Verification of an Electrical Circuit Simulation”

381

Thursday, September 14, 2006 Radisson Hotel RTP Room H, 3rd Floor

8:30-9:00 AM Continental Breakfast and Registration

9:00-10:45 AM Two invited talks on Environmental/Ecological Models: Christine A. Shoemaker, (Cornell University): “Continuous Optimization of Multi-Modal Computationally Expensive Models with Environmental Applications” Robert L. Wolpert, (Duke University): “Bayesian Semiparametric Space-Time Models”

10:45-11:00 AM Break

11:00-Noon Panel on Environmental/Ecological Models Peter Reichert, (Swiss Federal Institute of Aquatic Science and Technology – EAWAG), Moderator Ken Reckhow, (Duke University) Steve Sain, (National Center for Atmospheric Research)

Noon-1:15 PM Lunch by Subprogram to Identify Working Groups

1:15-2:30 PM Working Group Formation and Meeting

2:30-3:00 PM Break

3:00-4:00 PM Working Group Reports

Speaker Abstracts

Donald L. Bartel Cornell University Department of Mechanical and Aerospace Engineering [email protected]

“The Role of Computer Experiments in Evaluating Biomechanical Systems”

The Role of Computer Experiments in Evaluating Biomechanical Systems Biomechanical systems, such as bone-implant systems for total joint replacement or fracture healing, involve the interaction of prostheses and other implants with living tissues. These composite structures involve incompletely known interface conditions between biological and engineered materials, tissue properties that vary from patient to patient and in the same patient over time, and variable loads. When new implants are introduced, safety and efficacy must be established to obtain approval by the FDA, a process that involves in vitro and computational experiments along with clinical studies.

Finite element analyses have been used extensively to study the characteristics of contemporary joint replacements. The performance of these composite structures is a function of both design

382

and environmental variables. Design variables, such as implant material and geometry and the desired position and orientation of the implant with respect to the bone, are under the control of the engineer and surgeon designers. Environmental variables consist of the deviations in position and orientation, which occur due to the limited precision of surgical procedures, and patient variables—the properties of the bone supporting the prosthesis vary from patient to patient and in the same patient over time due to aging or disease, and joint loads also vary due according to patient weight and activity. It is impossible to obtain in vivo, patient-specific data; at best the environmental variables are estimated stochastically.

These systems are highly nonlinear because of implant material and tissue properties and because of the complex contact between the implant and the bone. Therefore, finite element analyses may require hours to days for a single analysis depending upon the complexity of the problem and the computational resources available. Optimal design methods based on typical search procedures are too expensive to be effective. Therefore, we have developed statistically based optimal design methods to study these systems. This approach makes it possible to determine the relative influence of design variables and environmental variables on structural performance. One important goal of this work is to determine robust designs that are insensitive to patient and surgical variables. A second goal is to calibrate computational and in vitro experiments.

Martin Bazant Massachusetts Institute of Technology Department of Mathematics [email protected]

“Stochastic Plasticity: A Multiscale Model for Granular Flow”

There has been much recent interest in modeling granular materials. Fast, dilute granular flows are well described by kinetic theory and hydrodynamics of inelastic gases, but there has not yet emerged any general theory -- statistical or continuum -- for dense granular flows. For example, no existing model can describe both draining silos and Coette cell, even qualitatively. Here, we present a cooperative mechanism for random packing dynamics based on diffusing "spots" of free volume. The Spot Model can produce very realistic silo drainage, in close agreement with discrete-element simulations of 400,000 spherical, viscoelectric spheres with frictional contacts. For general dense flows, we drive the dynamics via a “stochastic flow rule” for Mohr-Coulolmb plasticity, where stresses are at incipient yield, on average. Spots act as carriers of plastic deformation, analogous to dislocations in a crystal, and perform random walks along slip lines, biased by a local fluidization force. In the continuum limit, this simple model can accurately describe many granular flows in silos, Coette cells, plate-dragging, and heaps, with no adjustable parameters, other than the friction angle and the spot size (set by velocity correlations).

Bob Behringer Duke University Department of Physics [email protected]

“Statistical Properties of Dense Granular Materials”

383

I will present selected results to characterize the mean and statistical properties of dense granular materials. The introduction will set the background for the qualitative nature of fluctuations in dense granular systems, and some of the underlying mechanisms. A key ingredient is the spatial inhomogeneity in how forces are transmitted in dense granular materials; forces are carried primarily on filimentary structures known as force chains. These are visible using photoelastic techniques, which we have exploited to characterize the dynamics of Couette shear flow, the nature of the jamming transition, and the micromechanical properties of plastic failure. We will also argue that force structures show percolation-like behavior. We also consider the nature of diffusion in dense granular materials. We show that within the shear band which typically forms in granular Couette flow, diffusion is Brownian, within experimental resolution, although it is strongly affected by the analogue of Taylor dispersion. We also show that the concept of shear transformation zones, borrowed recently from metal plasticity by Falk, Langer, and Lemaitre can be applied to the non-affine motion that occurs in the shear band.

Mark Berliner Ohio State University Department of Statistics [email protected]

“Bayesian Treatment of Computer Model Output as Data”

Hierarchical Bayesian analysis provides a framework for the incorporation of computer model output into statistical models. The approach relies on the formal treatment of model output as data. I review approaches for constructing statistical data models for computer model output and then indicate how this step is absorbed into a hierarchical Bayesian model for the processes of interest. An interesting aspect of the formulation enables the design of superensemble (multi- model) computer experiments. Some examples will be reviewed, but the focus will be on a recent application to climate forecasting.

James Clark Duke University Nicholas School of the Environment [email protected]

“Tutorial Lecture on Environmental/Ecological Models”

Karen Daniels North Carolina State University Department of Physics [email protected]

“Rates of Mixing and Segregation in Sheared Granular Materials”

Granular materials typically segregate by size under shear, with the smaller particles moving in the direction of gravity and the larger particles accumulating at the top. We perform experiments in an annular cell continuously sheared from below, in which two sizes of glass spheres are initially placed in two unstably-stratified horizontal layers (smaller over larger). We observe the rate of mixing and re-segregation as a function of particle size ratio, shear speed, and confining

384

pressure. The segregation rate is found to be exponential in time, and quite sensitive to the choice of boundary condition (free surface or constant pressure).

Dorin Drignei National Center for Atmospheric Research Geophysical Statistics Project [email protected]

“Parameter Estimation for Computationally Intensive Nonlinear Regression with an Application to Climate Modeling”

Nonlinear regression is a useful statistical tool, relating observed data and a nonlinear function of unknown parameters. When the parameter-dependent nonlinear function is computationally intensive a straightforward regression analysis by maximum likelihood is not feasible. The method presented in this talk proposes to construct a faster running surrogate for such a computationally intensive nonlinear function, and to use it in a related nonlinear statistical model that accounts for the uncertainty associated with this surrogate. The statistical method is applied to a model calibration problem where the computationally intensive nonlinear function is a climate model.

Isaac Goldhirsch Tel-Aviv University Department of Fluid Mechanics and Heat Transfer [email protected]

“Some Open Problems in Granular Matter”

The talk will focus on open problems in granular gases and granular solids, contrasting those that are common to most many-body (including molecular) systems with those that are specific to granular matter. The latter can be mostly traced to the dissipative nature of the interactions of the constituents, the lack of scale separation in most granular systems, their metastable nature and more. Some recent results will be mentioned as well, time allowing.

John-Paul Gosling University of Sheffield Department of Probability and Statistics [email protected]

“Quantifying Uncertainty in the Biospheric Carbon Flux for England and Wales”

A crucial issue in the current global warming debate is the effect of vegetation and soils on carbon dioxide (CO2) concentrations in the atmosphere. Vegetation can extract CO2 through photosynthesis, but respiration, decay of soil organic matter and disturbance effects such as fire return it to the atmosphere. The balance of these processes is the net carbon flux. In order to estimate the biospheric carbon flux for England and Wales, the statistical problem of inference for the sum of multiple outputs from a complex deterministic computer code whose input parameters are uncertain is addressed. A Gaussian process model is used to build emulators of the multiple code outputs, and Bayesian uncertainty analysis is then used to propagate uncertainty in the input parameters through to uncertainty on the aggregated output. This talk

385

gives an overview of the analysis carried out in this application and focuses on how we can account for the different sources of uncertainty.

Genetha Anne Gray Sandia National Laboratories Computational Sciences & Mathematics Research [email protected]

“Calibration, Validation, and Verification of an Electrical Circuit Simulation”

Significant advances in computing capabilities, decreasing storage costs, and the rising costs associated with physical experiments have contributed to an increase in the use of numerical modeling and simulation. The inclusion of computer simulations in the study and design of complex engineering systems has introduced many new challenges. For example, code verification must be used to confirm that the underlying equations are being solved correctly. In addition, validation metrics must be carefully chosen in order to explicitly compare experimental and computational results and quantify the uncertainties in these comparisons.

Overall, the validation and verification process for computational experimentation can provide the best estimate of what can happen and the likelihood of it happening when uncertainties are taken into account. In this talk, we will discuss the validation process for an electrical circuit simulator. In particular, we will focus on the associated parameter extraction problem.

Gang Han Ohio State University Department of Statistics [email protected]

“Analysis of Computer Experiment Output Having Qualitative and Quantitative Input Variables”

We propose statistical models that allow prediction of a computer code at an untested set of qualitative and quantitative inputs as well as quantifying the uncertainty in the prediction. In the case of calibration, both the physical experiment and the computer code are allowed to depend on two types of variables. A Hierarchical Bayesian Qualitative and Quantitative Variable (HQQV) model is constructed and implemented by Markov Chain Monte Carlo methodology. This model is compared with a frequentist approach, a Bayesian independence model, and an autoregressive model in several examples.

Dave Higdon Los Alamos National Laboratory Statistical Sciences Group [email protected]

“Tutorial Lecture on Engineering Models”

Reinhard Laubenbacher Virginia Polytechnic Institute and State University Virginia Bioinformatics Institute [email protected]

386

“Complex Models in Systems Biology”

The increased availability of data in the life sciences has made it possible to build computational models at the system level. This holds true across scales, from the molecular to the ecosystem scales. Depending on the type of system, different modeling frameworks are being used, from the traditional systems of ordinary differential equations to stochastic models, as well as discrete models such as interaction-based models. The complexity of building, simulating, and validating large-scale models creates mathematical, computational, and biological challenges. This talk will describe some of these in the setting of discrete models and outline a mathematical program to address them.

Crystal Linkletter Simon Fraser University Department of Statistics and Actuarial Science [email protected]

“Reference Distribution Variable Selection for Gaussian Process Models”

In many situations, simulation of complex phenomena requires a large number of inputs and is computationally expensive. Identifying the inputs which most impact the system so that these factors can be further investigated can be a critical step in the scientific endeavor. In computer experiments, it is common to use a Gaussian spatial process to model the output of the simulator. These models are very flexible, but can make screening challenging. We introduce a new, simple method for identifying active factors in computer screening experiments. The approach is Bayesian and only requires the generation of a new inert variable in the analysis; however, in the spirit of frequentist hypothesis testing, the posterior distribution of the inert factor is used as a reference distribution against which the importance of the experimental factors can be assessed.

Max Morris Iowa State University Department of Statistics [email protected]

“Nonstationary Twists on Stationary Process Models”

In a typical computer experiment paradigm, a variance-stationary Gaussian process is used as a prior for the output of a computer model, and a meta-model (or “fast predictive approximation”) of the output function is derived from this. While the process need not be stationary, it is often difficult to know the pattern of nonstationarity that would be most appropriate. One result of using popular stationary variance models is that interval predictors often under-cover the actual computer model.

In this talk, we will describe two different approaches to using stationary process model “technology” to generate data-driven nonstationary meta-models of output. One approach is developed specifically for models arising from systems of ordinary or partial differential equations, based on the use of “truncation errors” generated by numerical solvers. The other approach yields a stationary predictor indexed by an “augmented” space of inputs.

387

Nancy Nichols University of Reading Department of Mathematics and Meteorology [email protected]

“Getting Started: Data Assimilation for Very Large Inverse Problems in Environmental Science”

For the very large systems that arise in the environmental sciences, the available data are not sufficient to initiate a complex computational forecasting model. The technique of data assimilation enables measured observations (over time) to be combined with model predictions to generate accurate estimates of the system states – both current and future. The problem of data assimilation is essentially an ill-posed inverse problem. An overview of data assimilation will be presented here, including an introduction to the problem and its mathematical formulation. The major types of assimilation method for treating very large nonlinear systems will be described and the application of these schemes will be illustrated with simple test examples.

In reality, environmental models do not represent the system behaviour exactly and errors arise due to lack of resolution and inaccuracies in physical parameters, boundary conditions and forcing terms. A technique for estimating systematic and time-correlated errors as part of the assimilation procedure will also be described here. The modified method determines a correction term that compensates for model error and leads to improved predictions of the system states. The effectiveness of the new procedure is demonstrated in an ocean application. In the concluding remarks additional current research issues will be discussed.

Bette Otto-Bliesner National Center for Atmospheric Research Climate Change Research [email protected]

“Studying Climate Change with Climate Models”

Climate models are key tools that scientists use to study our changing climate. These climate models are based on models that have been developed to predict our weather. They have been expanded to include the circulation of the ocean and interactions with vegetation and the atmospheric trace gases such a carbon dioxide and methane. The climate system is described in terms of basic physical laws. Climate models solve a series of equations representing these laws on a three-dimensional grid that covers the Earth and extends from the ocean floor to the top of the atmosphere.

Climate model results, like measurements, have uncertainties associated with them. For climate models, the uncertainties depends on the grid size used to represent the atmosphere and ocean, the time step used to integrate the series of equations forward in time, and incomplete representation of important processes in the model. For example, clouds and precipitation in the atmosphere often occur on spatial scales smaller than the grid that can be reasonably used on the supercomputers available today. In addition, because of the relatively long response times of some processes such as glacial-interglacial changes in atmospheric carbon dioxide and ice sheets, these processes are treated as forcings rather than feedbacks in many climate simulations.

388

Climate models are first judged on how well they reproduce the present-day climate. Past climate variability gives us a longer-term perspective to test our understanding of the climate system and the interactions among the atmosphere, oceans, and land surface. At NCAR, we use information about past variability to see how realistic the climate models being used to project future climate change are when the large climate forcings of the past are imposed.

Wendy Parker University of California-San Diego Science Studies Program [email protected]

“How to Think About Models and Their Evaluation—A Proposal”

I suggest that models be thought of as representational *tools*. On such a view, model confirmation is not confirmation of the truth of modeling assumptions but rather confirmation of the *adequacy* of a model for one or more purposes. I then suggest that model evaluation be conceptualized as a severe testing enterprise, and consider the advantages of thinking of model evaluation in this way rather than as an activity aimed to build confidence or credibility. Time permitting, I will illustrate some of these ideas with an example from climate modeling.

Angela Patterson GE Global Research [email protected]

“Data + Engineering Simulations for More Cost-Effective Engineering Design”

Engineering design depends heavily on the use of physics-based computer simulations to approximate relationships between inputs and outputs of a physical system. Some of the realities/challenges of using these models in practice are the following: 1. The models will be used beyond the range/space they were tested or calibrated. Data available for calibrating/testing the models are sparse and not always relevant for the specific application under design. 2. Inadequacies in the physics. It can be very costly and time-intensive to improve the physics. Model inadequacies may go undetected. 3. The models run slowly, so surrogate models (metamodels) are required. Empirical, faster running models are used to approximate the complex, physics-based models. Building the surrogate isn't trivial, when the number of design parameters is large. 4. Uncertainties are present. Uncertainties in measured values of both Xs and Ys (e.g. Measurement System Analysis) must be accommodated. To overcome these challenges, we need cost-effective test plans and the ability to fuse data with models for improved prediction capability. The purpose of this talk is to motivate, with examples, the need for good strategies and methodologies in this problem space.

Bruce Pitman State University of New York at Buffalo Department of Mathematics [email protected]

“Tutorial Lecture on Models of Granular Materials”

Zhiguang Qian

389

University of Wisconsin-Madison Department of Statistics [email protected]

“A Structural Equation Method for Temperature Modeling in Data Center Computer Experiment”

Temperature modeling is a key in designing and running a reliable data center with many computer components operating and generating heat constantly. How different configurations affect the data center thermal distribution is largely unknown because the physical thermal process is complex and depends on many factors. Constrained by time and cost, it is often difficult to conduct physical experiment to obtain detailed temperature measurements in the data center. Computational fluid dynamics based computer experiment is widely used as a proxy to study the air movement and temperature distribution mechanisms. A statistical method based on latent variables is introduced in this article for analyzing the multivariate temperature readings produced by the computer experiment. A two-stage estimation procedure is developed for the proposed model by making use of ordinary least square estimation and a pseud-likelihood method. Also discussed is a method using the fitted statistical model as a surrogate for various data center temperature management. Joint work with Yasuo Amemiya at IBM T. J. Watson Research Center.

Christine Shoemaker Cornell University Department of Civil and Environmental Engineering and Operations Research [email protected]

“Continuous Optimization of Multi-modal Computationally Expensive Models with Environmental Applications”

Many important problems in engineering and science require optimization of a computationally expensive (costly) function. These applications include calibration of model parameters to data and/or optimizing a design or operational plan to met an economic objective. With computationally expensive functions (like nonlinear systems of partial differential equations), this optimization is made difficult by the limited number of model simulations that can be done because each simulation takes a long time (e.g. an hour or more). The optimization problem is even more difficult if it has multiple local optima, thereby requiring a global optimization algorithm.

Our new algorithms use function approximation methods and experimental design to approximate the objective function based on previous costly function evaluations. Function approximation is combined with locations of previous costly function evaluations to select iteratively the next costly function evaluation. The theorem for convergence to the global minimum will be described.

Numerical algorithm comparisons will be presented for test functions and for an environmentally based partial differential equation model that requires 3 hours to run for each simulation. This nonlinear model based on fluid mechanics and chemical reactions) describes the transport of water and pollutants in a groundwater aquifer. The optimization is used for calibration of the model by selecting the parameter values (decision variables) that best fit measured data. The

390 parameter surface is multi-modal so this is a global optimization problem. The results indicate that the Regis and Shoemaker method generally gives better results for global optimization test problems and the environmental model than alternative methods when the number of model simulations is limited.

I will also discuss briefly a new joint NSF project on using our function approximation optimization methods in the context of Bayesian analysis of uncertainty. In this project we are combining optimization for calibration with an assessment of the uncertainty in calibrated parameter estimates and in calibrated model output based on input data.

All this work has been done jointly with Rommel Regis. Parts of the seminar will discuss work done with Pradeep Mugunthan, David Ruppert, and Nikolai Blizniouk

Recent References Mugunthan, P., C.A. Shoemaker, R. G. Regis “Comparison of Function Approximation, Heuristic and Derivative-based Methods for Automatic Calibration of Computationally Expensive Groundwater Bioremediation Models,” Water Resources Research Vol. 41, W11427,doi:10.1029/2005WR004134, Dec. 2005 Regis, R.G., C.A. Shoemaker, “A Stochastic Radial Basis Function Method for the Global Optimization of Expensive Functions”, INFORMS Journal of Computing, in press, 2006. Mugunthan, P., C.A. Shoemaker, “Assessing the Impacts of Parameter Uncertainty for Computationally Expensive Groundwater Models,” Water Resources Research, in press, 2006

Laura Swiler Sandia National Laboratories Optimization and Uncertainty Estimation [email protected]

“Building Credibility in Computational Simulations through Verification and Validation”

This talk will discuss the current state of verification and validation (V&V) and uncertainty quantification (UQ) at Sandia and the Dept. of Energy national laboratories. Four areas that are key to improving our credibility and predictive capability are physics modeling fidelity, verification activities, validation activities, and uncertainty quantification. Research needs in each of these areas will be discussed, along with the need to develop methods to assess the credibility of code when it is used outside a domain in which it has been validated. Three “challenge problems” that focus on model validation will be presented.

Jasper Vrugt Los Alamos National Laboratory Earth and Environmental Sciences Division [email protected]

“Calibration and Uncertainty Assessment of Environmental Models: Methods and Applications”

The field of earth sciences is experiencing rapid changes as a result of the growing understanding of environmental physics, along with recent advances in measurement technologies, and dramatic increases in computing power. More complex, spatially explicit computer models are

391 now possible, allowing for a more realistic representation of the system of interest. However, application of these models is not straightforward: many of the spatially distributed parameters and internal states in these models require calibration/updating before meaningful predictions can be made. Classical parameter estimation and data assimilation methods, originally developed for low-dimensional lumped problems, are frequently used. However, strong model nonlinearity, high dimensionality, and significant measurement and model structural uncertainty, hamper the use of such methods in complex simulation models. Also, spatially explicit models typically simulate several output fluxes for which measurement data are available and must be properly assimilated. In this talk I will present methods for improved calibration and uncertainty assessment of environmental models. In particular, I will discuss different methods for probabilistic forecasting, and highlight new concepts of model averaging and genetically adaptive multi-method search strategies for single and multi-objective model calibration. The various methods are illustrated using examples taken from surface and subsurface hydrology, meteorology, biology, and applied mathematics.

Darren Wilkinson Newcastle University Department of Mathematics & Statistics [email protected]

“Tutorial Lecture on Biological Models”

Robert Wolpert Duke University Institute of Statistics and Decision Sciences [email protected]

“Bayesian Semiparametric Space-Time Models”

A new class of semi-parametric Bayesian models is introduced for spatial, temporal, and spatio- temporal data, generalizing the kernel convolution of Levy random fields. The method is useful for building flexible spatio-temporal models that can accommodate non-Gaussian non-stationary spatio-temporal data while keeping the computation feasible even for large data sets. The methods are illustrated in an application to sulfur dioxide monitoring in mid-Atlantic states.

Lingchong You Mike West Duke University Duke University Institute of Genome Sciences and Policy Institute of Statistics and Decision Sciences [email protected] [email protected]

“Modelling Dynamic Cellular Networks”

We will present tutorial-level discussion of concepts, models and methods thatfeature in our work on modelling, simulation, analysis and design of dynamic cellular networks. We will briefly describe some of the biological context, including large-scale cancer biology and pathway studies that motivate some of our work, and then discuss aspects of biochemical network modelling and simulation from a bioengineering viewpoint. This will include reference to real biological systems that are key in human cancer studies, and synthetic networks engineered to emulate such real systems. Forward simulation is key to the bioengineering research in this area,

392 and we will discuss current systems developed for both deterministic and stochastic simulation. The use of such models in synthetic gene circuit design will be highlighted with some notable examples and success stories. We will then discuss some recently intiated studies to develop statistical approaches that complement the traditional ODE based methods, here in the context of a specific cancer pathway related study. This will include questions of modelling strategies as well as statistical research issues of model fitting and validation, the development and use of synthetic, engineered gene networks that emulate the human pathways, and innovations needed to develop these models to problems involving many thousands of cells in which we aim to properly represent, estimate and understand individual cell-specific stochastic phenomena in complex networks.

Poster Abstracts

Dianne Bautista Ohio State University Department of Statistics [email protected]

“A Flexible Covariance Function Estimator of a Gaussian Process Model”

Thomas Bengtsson Bell Labs Department of Statistics [email protected]

“Dynamic Bias Estimation for Nonlinear Systems using the Ensemble Kalman Filter”

Sourabh Bhattacharya (presented by Tony O’Hagan) University of Sheffield Department of Probability and Statistics [email protected]

“A New Methodology for Bayesian Emulation of Complex Dynamic Models”

In recent times, complex computer models have received wide attention in scientific research. However, in order to make conventional statistical statements regarding the scientific research, many expensive runs of the computer model is usually needed. New statistical theories, making their appearances, hold promise to alleviate the technical challenges. However, in cases where the underlying complex system is evolving with time, an effective theory for statistical analyses seems to be lacking. In this paper, we propose a novel Bayesian methodology that extends the existing methodologies to the case of dynamic complex systems.

Ariel Cintron-Arias SAMSI [email protected]

“Reproductive Numbers of Influenza in the US”

Tiangang Cui

393

University of Auckland Department of Engineering Science [email protected]

“Geothermal Model Calibration -- A Sample Based Approach”

Keith Dalbey State University of New York at Buffalo Department of Mechanical and Aerospace Engineering [email protected]

“Uncertainty Quantification of Geophysical Mass Flows”

Karen Daniels North Carolina State University Department of Physics [email protected]

“Rates of Mixing and Segregation in Sheared Granular Materials”

James R. Gattiker Southampton University [email protected]

“EOS Calibrations by Fast Computing Machines”

This poster describes the calibration of a simulator to shock speed particle speed data. Flyer plate experiments are used to measure the particle speed of a substance after impact at a certain shock speed. The relationship between the two velocities can be used to infer the equation of state of the material. The experiments are expensive and difficult to arrange, so there is a strong preference for using simulated results. The data come from experiments on 26 different materials. There are nine inputs that affect the simulator results for these 26 materials. Our goal is to calibrate these inputs in order to obtain simulator results that match the data well. The output from each simulation is highly multivariate so principal components are used to reduce the dimensionality. The resulting bases are modeled using Gaussian processes with a further discrepancy function to balance the calibration across the 26 material simulations.

John-Paul Gosling University of Sheffield Department of Probability and Statistics [email protected]

“Quantifying Uncertainty in the Carbon Flux for England and Wales”

Genetha Gray Sandia National Laboratories Computational Sciences & Mathematics Research [email protected]

394

“Designing Dedicated Experiments for Validation Activities of an Electrical Circuit Simulator”

A comprehensive design study of many of the complex systems in science and engineering may demand physical experimentation. However, increasing costs and decreasing resources is encouraging programs to limit the number of experiments and leverage existing databases. Instead, programs are turning to numerical modeling and simulation to aid their design efforts. Therefore, verification and validation (V&V) tools are critical for determining simulation-based confidence and predictive capabilities. In this poster, we will describe aspects of Sandia's V&V process on an electrical circuit simulator and explain the role of physical experiments in this process. Moreover, we will describe the design of experiments procedure used to create a data set that balances the resources for experiments, the simulator capabilities, and the required predictive confidence.

Mircea Grigoriu Cornell University Department of Civil Engineering [email protected]

“A Local Solution for Stochastic Transport Equations”

Andrew Gronewold Duke University Nicholas School of the Environment [email protected]

“Using Serial Dilution Tube Count Data to Improve Parameter Estimation in Bacteriological Water Quality Models”

Swathi Guda University of North Carolina-Chapel Hill Department of Mathematics [email protected]

“Rayleigh-Taylor Instability in a Sedimenting Suspension”

Daniel Henderson Newcastle University Department of Mathematics and Statistics [email protected]

“Bayesian Calibration of Biological Simulation Models”

Christel Hohenegger University of North Carolina-Chapel Hill Department of Mathematics [email protected]

“Diffusion-Induced Bias in Near-Wall Velocimetry”

395

Cari Kaufman SAMSI and NCAR [email protected]

“Covariance Tapering for Likelihood Based Estimation in Large Spatial Datasets”

Likelihood-based methods such as maximum likelihood, REML, and Bayesian methods are attractive approaches to estimating covariance parameters in spatial models based on Gaussian processes. Finding such estimates can be computationally undefeasible for large datasets, however, requiring O(n3) calculations for each evaluation of the likelihood based on n observations. I propose the method of covariance tapering to approximate the likelihood in this setting. In this approach, covariance matrices are “tapered,” or multiplied element-wise by a sparse correlation matrix. This produces matrices which can be be manipulated using more efficient sparse matrix algorithms. I present two approximations to the Gaussian likelihood using tapering. Focusing on the particular case of the Matern class of covariance functions, I give conditions under which tapered and untapered covariance functions produce equivalent (mutually absolutely continuous) measures for Gaussian processes on bounded domains. This allows me to evaluate the behavior of estimators maximizing the approximations to the likelihood under a bounded domain asymptotic framework. I present results from a simulation study showing agreement between the asymptotic results and what we observe for moderate but increasing sample sizes. Tapering methods can also be applied in fitting hierarchical Bayesian models involving large multivariate normal densities. Here, I discuss the particular application of making inference about the climatological (long-run mean) temperature difference between two sets of output from a computer model of global climate, run under two different land use scenarios.

Michael Last National Institute of Statistical Sciences [email protected]

“Efficient Analysis of Complex Simulations: Pooled ANOVA”

Complex simulations are often analyzed as experiments, where the effects of changing settings are analyzed within traditional frameworks, such as ANOVA. Not all settings are important; identification of which settings are important, and what their effects are, is a key question simulations attempt to answer. Pooled ANOVA is an adaptive design technique that tests groups of settings together. Should no settings in a group have a significant effect, the entire group can be dismissed with only one test. Retesting is able to deal with effects of opposite effect possible canceling each other out. Significant interactions can also be detected. Compared to similar methods (e.g. Sequential Bifurcation), Pooled ANOVA is able to work with weaker assumptions.

Fei Liu Duke University Institute of Statistics and Decision Sciences [email protected]

“Bayesian Functional Data Analysis with Application to Multiple Computer Model Validation”

396

Dharmesh Maniyar Aston University Department of Information Engineering [email protected]

“Dealing with Large Complex Models in Emulators: Guided Bayesian Committee Machine”

The Emulator methodology is well established and has been successfully applied to a number of interesting problems, however being based on a Gaussian process representation of the model input-output mapping means that we have been limited in the complexity of the models we can tackle by the issue of the scaling of the inference process in Gaussian processes, both in terms of the number of training runs used, and the dimension of the input and output spaces.

In this poster we present an approach to dealing with emulators which can treat the case of complex models with varying behaviour in different regions of input space, and remains effective in high dimensional spaces. The essence of the idea is to exploit hierarchical non-linear visualisation approaches and to allow user interaction with the process. The inputs to the emulator are initially provided to a trained mixture based hierarchical visualization model, which segments the input space into regions with similar behaviour. These input regions can then be separately modelled using Gaussian processes with automatic relevance determination priors over the length scales of the inputs to reduce their complexity. We can then use the Bayesian committee machine framework to integrate the various regions to produce a consistent estimate of the output distribution. Emulator outputs can be used to guide the development of the visualization hierarchy using color or symbols or both. We propose several extensions in the future work section.

Amy Nail North Carolina State University Department of Statistics [email protected]

“Towards a Statistical CMAQ: A Prototype for Observation-Based Models to Assess Emission Control Strategies”

Abani Patra State University of New York at Buffalo Department of Mechanical and Aerospace Engineering [email protected]

“Uncertainty Quantification Approaches for Models of Hazardous Geophysical Mass Flows”

M. A. Perry, R. A. Bates, H. P. Wynn London School of Economics Department of Statistics [email protected], [email protected]

“A Finite Element Based Sensitivity Analysis Formulation for Robust Design of Piezoelectric Based Technologies”

397

Sensitivity Analysis is a branch of numerical analysis that aims to quantify the effects that variability in the system parameters has on model output. Piezoelectric systems are routinely modeled using finite element analysis as part of the general design process. A finite element based sensitivity analysis formulation for piezoelectric media is developed here and implemented to simulate the operational and sensitivity characteristics of a piezoelectric distributed mode actuator (DMA). The work acts as a starting point for robustness analysis that can be used to aid the engineer in the process of designing a pzt-based technology.

Brian Reich, Curtis Storlie and Howard Bondell North Carolina State University Department of Statistics [email protected], [email protected], [email protected]

“Nonparametric Variance Decomposition via Bayesian Smoothing Splines”

Complex computer models are widely used in risk assessment of complex systems. These models can have very complex structure with high-dimensional inputs that are often uncertain. The goal of sensitivity analysis is to determine the input variables that explain the most uncertainty in the model output. One approach to deal with this is to allow the inputs to come from probability distributions. Ideally, the variable importance could be measured by variance decomposition. However, for complicated models computational costs make this is infeasible. Typically, the model can only be evaluated at small number of input values generated from the input distributions. We propose a multiple predictor, non-parametric regression model for these data. In this high-dimensional setting with highly-correlated inputs, there are many models that fit the data equally well. Our Bayesian model averaging approach allows us to obtain confidence sets for variance importance measures that take into account model uncertainty.

Matt Taddy University of California-Santa Cruz Department of Applied Math and Statistics [email protected]

“Optimization with a Gaussian Process Oracle”

Guillaume Vernieres SAMSI [email protected]

“Assimilation of altimetry data into a quasigeostrophic model of the Kuroshio south of Japan”

Darren Wilkinson Newcastle University Department of Mathematics & Statistics [email protected]

“Bayesian Calibration of Biological Simulators”

Steve Wojtkiewicz University of Minnesota

398

Department of Civil Engineering [email protected]

“Ensemble Uncertainty Quantification Methods”

Lingchong You Mike West Duke University Duke University Biomedical Engineering Institute of Statistics and Decision Sciences Genome Sciences and Policy Biostatistics and Bioinformatics [email protected] [email protected]

“Modelling dynamic cellular networks”

We will present tutorial-level discussion of concepts, models and methods that feature in our work on modelling, simulation, analysis and design of dynamic cellular networks. We will briefly describe some of the biological context, including large-scale cancer biology and pathway studies that motivate some of our work, and then discuss aspects of biochemical network modelling and simulation from a bioengineering viewpoint. This will include reference to real biological systems that are key in human cancer studies, and synthetic networks engineered to emulate such real systems. Forward simulation is key to the bioengineering research in this area, and we will discuss current systems developed for both deterministic and stochastic simulation. The use of such models in synthetic gene circuit design will be highlighted with some notable examples and success stories. We will then discuss some recently intiated studies to develop statistical approaches that complement the traditional ODE based methods, here in the context of a specific cancer pathway related study. This will include questions of modelling strategies as well as statistical research issues of model fitting and validation, the development and use of synthetic, engineered gene networks that emulate the human pathways, and innovations needed to develop these models to problems involving many thousands of cells in which we aim to properly represent, estimate and understand individual cell-specific stochastic phenomena in complex networks.

Aijun Zhang University of Michigan Department of Statistics [email protected]

“Space-filling Designs with Minimum Energy”

Tonglin Zhang Purdue University Department of Statistics [email protected]

“Loglinear Residual Tests of Moran' I Autocorrelation and Their Applications to Kentucky Breast Cancer Data”

Moran's I is the most widely used and the most frequently cited test statistic in spatial statistical literature. This research bridges the permutation test of Moran's I to the residuals of a loglinear model under the asymptotic normality assumption. It provides the versions of Moran's I based on

399

Pearson residuals IPR and deviance residuals IDR so that they can be used to test for spatial clustering while at the same time account for potential covariates and heterogeneous population sizes. Our simulations showed that both IPR and IDR are effective to account for heterogeneous population sizes. The tests based on IPR and IDR are applied to a set of loglieanr models for early stage and late-stage breast cancer with socioeconomic and access-to-care data in Kentucky. The results showed that socioeconomic and access-to-care variables can sufficiently explain spatial clustering of early stage breast carcinomas, but these factors cannot explain that for the late- stage. For this reason, we used local spatial association terms and located four late-stage breast cancer clusters that could not be explained. The results also confirmed our expectation that a high screening level would be associated with a high incidence rate of early stage disease, which in turn would reduce late-stage incidence rates.

C. Joint Engineering and Methodology Workshop October 26-27, 2006

Thursday, October 26, 2006 SAMSI Room 104

8:45-9:15 AM Registration and Continental Breakfast

9:15-9:30 AM Welcome Jim Berger, SAMSI

9:30-10:45 AM Derek Bingham, Simon Fraser University “Experiment Designs for Model Calibration”

10:45-11:00 AM Break

11:00-12:15 PM Leslie M. (Lisa) Moore, Los Alamos National Laboratories ”Computer Experiment Designs to Achieve Multiple Objectives”

12:15-1:30 PM Lunch

1:30-3:15 PM Discussion Session: Functional Outputs Jim Berger, SAMSI and Jerry Sacks, NISS

3:15-4:30 PM Break

4:30-5:30 PM SAMSI Distinguished Lecture (MCNC Auditorium Anthony O’Hagan, University of Sheffield 3021 Cornwallis Rd.) “Managing Uncertainty in Complex Models”

5:30-6:30 PM Reception

Friday, October 27, 2006 SAMSI Room 104

9:00-9:30 AM Registration and Continental Breakfast

400

9:30-10:45 AM Steve Wojtkiewicz, University of Minnesota “Ensemble Uncertainty Quantification”

10:45-11:00 AM Break

11:00-12:15 PM Herbie Lee, University of California – Santa Cruz “Bayesian Treed Gaussian Process Models and Sequential Experimental Design”

12:15-1:30 PM Lunch

1:30-2:45 PM Tom Santner, Ohio State University “Sequential Design of Computer Experiments for Constrained Optimization”

2:45-3:00 PM Break

3:00-4:30 PM Final Open Discussion

Speaker Abstracts

Derek Bingham Simon Fraser University Department of Statistics and Actuarial Science [email protected]

“Experiment Designs for Model Calibration”

Physical experimentation is an expensive endeavor. Computer experiments, which are often a less-expensive alternative, provide a reasonably accurate representation of physical experimental results. This work proposes an approach to designing experiments when the researcher has a limited amount of data collected on each of the physical and computer experiments and wishes to allocate future experiments to be tested at inputs for the physical experiments, the computer experiments, or experiments to be run at combinations of both physical and computer models. This hybrid experimental situation relies on Gaussian process formulations for the modeling of the experimental outputs as functions of the experimental inputs. We demonstrate the methodology with both simulated and actual experimental situations.

Herbie Lee University of California, Santa Cruz Department of Applied Mathemathics & Statistics [email protected]

“Bayesian Treed Gaussian Process Models and Sequential Experimental Design”

Treed Gaussian process models are a computationally efficient way of capturing nonstationarity in a spatial process, such as the output of a computer model. R code is available from CRAN in the library tgp. The Bayesian perspective allows full accounting for uncertainty, which is also

401

useful for designing sequential computer experiments. This talk will give a brief overview of the motiving example from aeronautical engineering that led to the development of these models and code, and will also include a demonstration of the R code.

Dr. Leslie M. (Lisa) Moore Los Alamos National Laboratory Statistical Sciences Group [email protected]

“Computer Experiment Designs to Achieve Multiple Objectives”

Simulator codes are a basis for inference in many complex problems including weapons performance, materials aging, infrastructure modeling, nuclear reactor production, and manufacturing process improvement. Goals of computer experiments include sensitivity analysis to gain understanding of the input space and construction of an emulator that may form a basis for uncertainty analysis or prediction. Orthogonal arrays, or highly fractionated factorial designs, and near-orthogonal arrays are used for computer experiments for sensitivity analyses. Latin hypercube samples, possibly selected by space-filling criterion, are in common use when Gaussian spatial processes are the modeling paradigm or uncertainty analysis is the objective. Orthogonal-array based Latin hypercube designs are used to achieve both objectives. Improvement in terms of obtaining a space-filling design will be demonstrated for orthogonal- array based Latin hypercube design. The impact of competing experiment objectives will be discussed in terms of loss of efficiency in sensitivity analysis conducted with data from a Latin hypercube design.

Anthony O'Hagan University of Sheffield Department of Probability and Statistics [email protected]

“Managing Uncertainty in Complex Models”

Mathematical models are used in almost every sphere of science, technology, commerce, government and industry. They are used to predict, and to gain understanding of, complex processes. Model predictions are increasingly relied upon in situations where direct measurement is impractical, but increasingly the users are demanding to know how accurate those predictions are. Quantifying the uncertainties in model outputs is not simple, partly because there are many different sources of uncertainty, and partly because the models themselves are often large and complex. Uncertainty should in principle be reduced when we are able to compare model predictions with field observations, but in practice it is difficult for the same reasons.

This talk will describe modern methods, based on Bayesian statistics, to facilitate operating with uncertainties in complex models. They can be orders of magnitude more efficient than more traditional methods, in terms of the numbers of model runs required to obtain sound answers. They also encompass uncertainty propagation and assimilation of observational data within a single coherent framework.

The talk will outline the current state of the art and areas of active research.

402

Steve Wojtkiewicz University of Minnesota Civil Engineering Department [email protected]

“Ensemble Uncertainty Quantification”

This presentation addresses the need for the development of uncertainty quantification algorithms that leverage information from one realization to another. Although the size of the computational models used in many engineering and scientific simulations is extremely large, i.e. millions of equations, the uncertainty to be analyzed is oftentimes very localized to small regions of the model. One example of this manifests itself in a study of the effects of damping in connections between structural dynamic subsystems. While the full analysis model for this system is on the order of a million degrees of freedom, the nodes involved in an uncertainty analysis of the connection is on the order of ten. Here, recent efforts to explore, expand, and develop UQ methods that exploit this localization of uncertainty will be discussed.

Algorithms for linear algebraic and dynamic systems have been developed and will be outlined. In addition, their efficacy will be demonstrated through several examples. These algorithms utilize linear algebra techniques for low rank matrix updates, Sherman-Morrison-Woodbury formulas, and their dynamical analogs. The computational procedure consists of a small number of full system runs, the number of nodes involved in the connections in the abovementioned scenario. The solutions from this small number of runs are then used to construct a solution update procedure where the remaining computation for each realization involves a system solution of this greatly reduced size.

The ratio of the cost of each subsequent realization after these initial calculations to a full system solution is on the order of the ratio of the number of degrees of freedom of the full system model to that involving uncertainty. Thus, one can expect speedups of several orders of magnitude for the subsequent realizations. In addition, the system updates, due to the small systems being solved, can be performed using a wider variety of computing resources.

It is foreseen that the greatly increased number of realizations can be used to obtain greater fidelity in failure assessments (smaller failure probabilities) and/or to address the epistemic uncertainty issue by considering alternate plausible uncertainty models, including interval models, for the parameters being studied.

D. Biosystems Modeling Workshop March 5-7, 2007

Monday – March 5, 2007 Radisson Hotel RTP Room H, 3rd Floor

8:00-8:50 AM Registration and Continental Breakfast

8:50-9:00 AM Welcome

403

Jim Berger, SAMSI

9:00-10:00 AM Tutorial on Biochemical Network Modeling Eberhard Voit, Georgia Institute of Technology

10:00-10:30 AM Break

10:30-11:30 AM Tutorial on Stochastic Kinetic Models Greg Rempala, University of Louisville

11:30-12:30 PM Tutorial on Drawing Together Aspects of Systems Engineering and Systems Biology Herschel Rabitz, Princeton University

12:30-1:45 PM Lunch, Room FG

1:45-2:45 PM “Dynamical Properties of Biochemical Reaction Networks” Maya Mincheva, University of Wisconsin

2:45-3:45 PM “Computational Systems Immunology” Thomas Kepler, Duke University

3:45-4:15 PM Break

4:15-5:00 PM Poster Presentation Introduction Session

6:30-8:30 PM Poster Session and Reception Room AB, 2nd Floor (Poster presenters please arrive early - posters should be set-up by 6:15 pm

Tuesday – March 6, 2007 Radisson Hotel RTP Room H, 3rd Floor

8:30-9:00 AM Registration and Continental Breakfast

9:00-10:00 AM “Efficient Stochastic Search Algorithms for “Large p” Regression with > Dependent Covariates” Adrian Dobra, University of Washington

10:00-11:00 AM “On Some Stochastic Oscillator Problems Related to Biological Rhythms” Guillaume Bonnet, University of California-Santa Barbara

11:00-11:30 AM Break

11:30-12:30 AM Panel Discussion: Top-Down versus Bottom-Up Approaches to Biosystems Modeling Panelists: John Yin, University of Wisconsin

404

Tom Knudsen, University of Louisville

12:30-1:45 PM Lunch, Room FG

1:45-3:50 PM New Researcher Session (20 minutes each)

“Diffusion Ratchets and Molecular Motors” John Fricks, Pennsylvania State University

“The Reaction-Diffusion Master Eq. is an Asymptotic Approximation of Diffusion to a Small Target” Sam Isaacson, University of Utah

“Phenotypic Clustering of Yeast Mutants Based on Kinetochore Microtubule Dynamics” Khuloud Jaqaman, Scripps Research Institute

“Multiscale Methods in Heat Shock Model” Hye-Won Kang, University of Wisconsin

“The Effect of Reactant Size on Stochastic Chemical Kinetics” Sotiria Lampoudi, University of California-Santa Barbara

“A Chemical Kinetic Models for Transcriptional Elongation” Richard Yamada, Cornell University

3:50-4:20 PM Break

4:30-5:30 PM SAMSI Distinguished Lecture “Stochastic Chemical Kinetics” Daniel Gillespie, Gillespie Consulting

5:30-6:30 PM Lecture Reception, Room FG

Wednesday – March 7, 2007 Radisson Hotel RTP Room H, 3rd Floor

8:30-9:00 AM Registration and Continental Breakfast

9:00-10:00 AM “Sophisticated Statistical Mechanics of Sloppy Models” Kevin Brown, Harvard University

10:00-11:00 AM “Model Reduction Techniques for Biochemical Networks” Lea Popovic, Cornell University

11:00-11:30 AM Break

405

11:30-12:30 PM Panel Discussion: Key Mathematical and Statistical Challenges in Computational Systems Biology Panelists: Linda Petzold, University of California-Santa Barbara Michael Reed, Duke University Darren Wilkinson, University of Newcastle

12:30-1:45 PM Lunch, Room FG

1:45-2:45 PM “Stochastic Challenges in Single-Molecule Biophysics” Samuel Kou, Harvard University

2:45-3:00 PM Break

3:00-4:30 PM Discussion of Working Group Projects and Activities

Speaker Abstracts

Guillaume Bonnet University of California-Santa Barbara Department of Statistics and Applied Probability [email protected]

“On Some Stochastic Oscillator Problems Related to Biological Rhythms”

Biological systems often present regular oscillatory characteristics. Circadian, neurological and cardiac rhythms are just three of the most familiar examples of such phenomenon. In order to understand the underlined system, deterministic mathematical models have been proposed and analyzed, following the seminal works of Winfree, Kuramoto and Strogatz in particular. However, empirical observations, in particular in molecular networks, show that stochastic models should be more appropriate. There exist a rapidly growing literature on such models, some of which have a long history in mechanical engineering and Physics. Central issues include the sustainability and precision of the oscillations. For Biological systems, the synchronization behavior among coupled systems of oscillators is of crucial importance as well.

In this talk, I will first attend to give an overview of the type of models, known results, and empirical observations from various fields. In an attempt to give rigorous formulation to such problems, I will then show how probabilistic tools, in particular the theory of random dynamical systems (in the sense of Arnold), can give rigorous explanations to some of the experimental observations, can point to some overlooked issues, and more generally give the right framework for further theoretical investigations to biologically motivated problems. Moving towards more realistic models of stochastic oscillators for chemical networks, I will also present some ongoing related work on a stochastic Lotka-Volterra system.

Kevin Brown Harvard University Department of Molecular and Cellular Biology [email protected]

“Sophisticated Statistical Mechanics of Sloppy Models”

406

Models of intracellular protein signaling networks can have tens to hundreds of dynamical variables and as many, if not more, rate parameters drawn from chemical kinetics. Almost all of these parameters are unknown and data are often sparse, making the models only barely determined. There is a famous aphorism in physics: “Give me four parameters and I can fit an elephant. Give me five and I can make it wag its tail.” When one considers that even simple models may have tens or hundreds of parameters and such models get more complex very quickly, an attempt to generate meaningful and useful models of biological regulation appears even more daunting. My collaborators and I use methods from statistical physics - ensemble theory, spectral decomposition, Monte Carlo simulation - to analyze and understand these large, poorly constrained models. Our insights into the cost surface geometry of signaling networks have revealed them to be only one representative of a new universality class of multiparameter models constrained by data; other examples include fitting sums of exponentials to radioactive decays, varying parameters of variational wave functions used in quantum Monte Carlo calculations, and fitting empirically derived atomic potentials.

Adrian Dobra University of Washington Department of Statistics [email protected]

“Efficient Stochastic Search Algorithms for “Large p” Regression with > Dependent Covariates”

We describe and compare novel stochastic search methods for exploring high-dimensional model spaces. Key issues involve estimation, prediction and variable selection in the presence of dependencies among candidate predictors. We show how to transform stochastic algorithms for regression model search into corresponding methods for covariance estimation in gaussian models. We illustrate our approaches using simulated data as well as real datasets from breast cancer and heart genomics.

John Fricks Pennsylvania State University Department of Statistics [email protected]

“Diffusion Ratchets and Molecular Motors”

One commonly used model for linearly progressive biomolecular motors in the biophysics literature is the Brownian ratchet mechanism. In this talk, a precise mathematical formulation of a Brownian ratchet (or more generally a diffusion ratchet) will be given via an infinite system of stochastic differential equations with reflection. This formulation will be seen to arise in the weak limit of a natural discrete time/space model that is used to describe motor dynamics in the literature. Numerical techniques will be provided to compute asymptotic quantities such as asymptotic velocity, effective diffusivity, and the randomness parameter for this model and other closely related models.

SAMSI Distinguished Lecture

Daniel Gillespie

407

Gillespie Consulting [email protected]

“Stochastic Chemical Kinetics”

The time evolution of a well-stirred chemically reacting system is traditionally modeled by a set of coupled ordinary differential equations called the reaction rate equation (RRE). The resulting picture of continuous deterministic evolution is, however, valid only for infinitely large systems. That condition is usually well approximated in laboratory test tube systems. But in biological systems formed by single living cells, the small population numbers of some reactant species can result in dynamical behavior that is noticeably discrete rather than continuous, and stochastic rather than deterministic. In that case, a more physically accurate mathematical modeling is obtained by using the machinery of Markov process theory, specifically, the chemical master equation (CME) and the stochastic simulation algorithm (SSA). After reviewing the theoretical foundations of stochastic chemical kinetics, we will describe a way to approximate the SSA by a faster simulation procedure, and then show how this way also provides a logical bridge between the CME/SSA description and the RRE description.

Samuel Isaacson University of Utah Department of Mathematics [email protected]

“The Reaction-Diffusion Master Eq. Is an Asymptotic Approximation of Diffusion to a Small Target”

We will present several mathematical models for studying reaction-diffusion processes wherein both noise in the chemical reaction process and diffusion of individual molecules may be important. In particular, we will examine the relation between the reaction-diffusion master equation model of spatially distributed stochastic chemical kinetics and models that track individual particles. Our analysis will demonstrate the importance of modeling point binding, equivalently binding to a small target, in understanding the reaction-diffusion master equation.

Khuloud Jaqaman Scripps Research Institute Cell Biology [email protected]

“Phenotypic Clustering of Yeast Mutants Based on Kinetochore Microtubule Dynamics”

To test the hypothesis that kinetochore proteins regulate kinetochore microtubule (kMT) dynamics, we measured single kMT dynamics in the budding yeast S. cerevisiae and compared them between wild type (WT) and strains carrying kinetochore protein mutations. We established autoregressive moving average (ARMA) model parameters as a unique and complete set of descriptors of kMT dynamics that allowed us to distinguish between subtle phenotypes associated with gene deletions and temperature-sensitive mutations. ARMA models extracted the dependence of kMT length on its history and on a related white noise series which embodied the stochastic nature of kMT dynamics. Multiple kMT length series from each condition were fitted together, taking into account observational error and missing observations, to achieve robust

408 parameter estimation. We also estimated the variance-covariance matrices of ARMA descriptors and used them to compare descriptors between different conditions within a statistical hypothesis testing framework. The p-values from the statistical tests revealed which conditions had different dynamics, and also provided us with a proximity measure that we used for clustering kMT dynamics. This allowed us to classify kinetochore proteins within functional groups. We found that kinetochore proteins do indeed regulate kMT dynamics. For instance, kMT dynamics in the mutants okp1-5 and kip3Δ are different from those in WT. Furthermore, we found that the proteins Ipl1p, Dam1p and Kip3p form one functional group, where the dynamics resulting from their mutation are equivalent and significantly different from dynamics in WT. In addition to their classification power, ARMA descriptors are ideal intermediate statistics for matching experimental and simulated kMT dynamics for calibrating stochastic mechanistic models of kMT regulation by kinetochore proteins.

Hye-Won Kang University of Wisconsin-Madison Department of Mathematics [email protected]

“Multiscale Methods in Heat Shock Model”

Thomas Kepler Duke University Departments of Immunology and Biostatistics & Bioinformatics [email protected]

“Computational Systems Immunology”

Samuel Kou Harvard University Department of Statistics [email protected]

“Stochastic Challenges in Single-Molecule Biophysics”

Recent advances in nanotechnology allow scientists to follow a biological process on a single molecule basis. These advances also raise many challenging stochastic modeling and statistical inference problems. First, by zooming in on single molecules, recent nano-scale experiments reveal that some classical stochastic models derived from oversimplified assumptions are no longer valid. Second, the stochastic nature of the experimental data and the presence of latent processes much complicate the statistical inference. In this talk we will use (i) the modeling of enzymatic reaction pathways and (ii) the modeling of subdiffusion phenomenon in enzymatic conformational fluctuation to illustrate the stochastic challenges in single-molecule biophysics.

Sotiria Lampoudi University of California-Santa Barbara Department of Computer Science [email protected]

“The Effect of Reactant Size on Stochastic Chemical Kinetics”

409

The cytoplasm is a crowded place, and important biochemical reactions frequently involve very large molecules. Understanding the effects of crowding and of the size of reactants on the rates of reactions is very important. The mass action formalism, reflected in the combinatorial product of reactant concentrations and populations in the reaction rate equations and the propensity functions of the Stochastic Simulation Algorithm, respectively, breaks down in situations where the volume excluded by the reactants in the system is large. In this talk I will present recent results on the effect of excluded volume in the case of the annihilation reaction (A+A) in a one- dimensional, finite system of hard sphere molecules of non-zero size moving ballistically. I will also discuss the implications of the one-dimensional result to higher dimensions and more complicated scenarios.

This is joint work with Dan Gillespie and Linda Petzold.

Maya Mincheva University of Wisconsin-Madison Department of Mathematics [email protected]

“Dynamical Properties of Biochemical Reaction Networks”

An important problem in modern cellular biology is to understand the dynamics of interactions in complex networks of genes, proteins and enzymes.

Mathematical models of biochemical reaction networks result in large systems of differential equations that are usually nonlinear and have many unknown parameters. Because of these unknown parameters (e.x. reaction rate constants) direct numerical simulation of the dynamics is practically impossible.

On the other hand, important properties of the biochemical systems are determined only by the network structure, and do not depend on the unknown parameters. We describe how a bipartite graph associated with the biochemical reaction network can be used to predict its dynamical properties, such as multistability and oscillations. This analysis generalizes the positive/negative feedback cycle conditions for instability.

In more general models, instabilities can be caused by diffusion or delays. Similar network conditions relate the structure of the same bipartite graph to delay-induced oscillations or Turing instability.

Lea Popovic Cornell University Department of Mathematics [email protected]

“Model Reduction Techniques for Biochemical Networks”

In this talk I will emphasize theoretical tools for analysis of biochemical networks which use the multi-scale nature of such systems. I will describe possible ways of reducing the dimensionality of the system. I will also discuss an approach to analyzing the initial state of the system.

410

Herschel Rabitz Princeton University Department of Chemistry [email protected]

“Drawing Together Aspects of Systems Engineering and Systems Biology”

Systems engineering is a rich subject and similarly, systems biology is expected to be equally as rich given the broad scope of phenomena involved. Some specific aspects of systems biology will be discussed covering (a) control of bionetworks, (b) experimental design of genetic circuits, (c) optimal parameter identification, and (d) an algorithm for accelerating the discovery of efficacious molecular agents. These advances are a work in progress, and a summary of the activities will be presented

Grzegorz A Rempala University of Louisville Department of Mathematics [email protected]

“Tutorial on Stochastic Models in Chemical Kinetics”

The talk shall give a brief overview of the stochastic models of chemical kinetics. It will outline in particular the classical Markov model of chemical reactions under the law of mass action as well as its various approximations obtained by e.g., applying the law of large numbers and functional central limit theorem. Some basic representations of the stochastic kinetic models helpful in developing simulation schemes shall be also discussed.

Eberhard Voit Georgia Institute of Technology Department of Biomedical Engineering [email protected]

“Tutorial on Biochemical Network Modelling”

In this tutorial, I will present different approaches to modeling and understanding biochemical systems and discuss research topics that are presently not addressed in a satisfactory manner and await further investigation.

Richard Yamada Cornell University Department of Applied Mathematics [email protected]

“A Chemical Kinetic Models for Transcriptional Elongation”

Transcription is the first step in gene expression, and the step at which most gene regulation occurs. Transcription consists of 3 distinct stages: initiation, elongation, and termination. Out of all these steps, transcriptional elongation is the step most amenable to a quantitative description;

411 experimental results in the past 5 years have made it possible to test predictions from quantitative models.

In this talk, a chemical kinetic model of the transcriptional elongation dynamics of RNA polymerase along a DNA sequence is introduced. The proposed model governs the discrete movement of the RNA polymerase along a DNA template, with no consideration given to elastic effects. The model’s novel concept is a “look-ahead” feature, in which nucleotides bind reversibly to the DNA prior to being incorporated covalently into the nascent RNA chain. Computational results, via the Gillespie algorithm, for the proposed model are presented for specific DNA sequences used in actual single-molecule experiments of RNA polymerase along DNA. By replicating the data analysis algorithm from the experimental procedure, the computational model produces velocity histograms, enabling direct comparison with these published experimental results. Parameter estimation techniques are an important part of this research in order to find an optimal set model’s parameters and interpret their significance. We discuss our attempts to use Bayesian methods, using our stochastic model, to find parameters from single trace time data.

This research is joint work with Professor Charles Peskin (CIMS/NYU).

E. Joint SAMSI/MUCM Mid-Program Workshop April 2-3, 2007

Monday – April 2, 2007 NISS Building, Room 104

8:45-9:15 AM Registration and Continental Breakfast

9:15-9:30 AM Welcome Jim Berger, SAMSI

MORNING SESSION Chair: Henry Wynn, London School of Economics

9:30-11:15 AM “Calibration of a 3-D Air Quality Model” Serge Guillas, Georgia Institute of Technology

“State-Space Modelling of Soil-Moisture” Jonathan Rougier, SAMSI and University of Bristol

11:15-11:45 AM Break

11:45-12:30 PM “Efficient Emulators of Computer Experiments Using Covariance Tapering” Cari Kaufman, SAMSI

12:30-2:00 PM Lunch

AFTERNOON SESSION

412

Chair: Robert Wolpert, Duke University

2:00-3:45 PM “Global Optimization Using Pattern Search and Treed Gaussian Processes” Herbie Lee, University of California-Santa Cruz

“Hierarchical Calibration” James Gattiker, Southampton University

3:45-4:00 PM Break

4:00-5:00 PM “Predicting Geological Mass Flows from Field Data” Bruce Pitman, University of Buffalo-SUNY Elaine Spiller, SAMSI

Tuesday – April 3, 2007 NISS Building, Room 104

8:45-9:15 AM Registration and Continental Breakfast

MORNING SESSION Chair: Susie Bayarri, University of Valencia and SAMSI

9:15-11:00 AM “Analysing Input and Structural Uncertainty of a Hydrological Model with Stochastic Time-Dependent Parameters” Peter Reichart, EAWAG and SAMSI

“Analysis of Oscillatory Patterns of Disease Transmission” Ariel Cintron, SAMSI and North Carolina State University

11:00-11:15 AM Break

11:15-1:00 PM “Emulation of Dynamic Computer Models” John Paul Gosling, University of Sheffield

“Bayesian Kalman Filter for Emulation of Complex Engineering Computer Models” Gentry White, North Carolina State University and SAMSI

1:00-2:15 PM Lunch

AFTERNOON SESSION Chair: Tom Loredo, Cornell University

2:15-4:00 PM “An Alternative Approach to Modelling the Bias Term” Rui Paulo, ISEG, Technical University of Lisbon

“Cholesky Decomposition - A Tool with Many Uses” Anthony O’Hagan, University of Sheffield

413

4:00-4:15 PM Break

4:15-5:00 PM “Generalised Aberration for Space Filling Designs” Hugo Maruri-Aguilar, London School of Economics

Speaker Abstracts

John Paul Gosling University of Sheffield Department of Probability and Statistics [email protected]

“Emulation of Dynamic Computer Models”

Highly efficient techniques to analyse computer code output have been developed based on a statistical model that is built to emulate the computer model. The approach, however, is less straightforward for dynamic simulators, which are designed to represent time-evolving systems. In this talk, I will describe a recursive system to emulate dynamic models; the system is illustrated with an application to a rainfall-runoff model.

Serge Guillas Georgia Institute of Technology Department of Mathematics [email protected]

“Calibration of a 3-D Air Quality Model”

RAQAST is a regional chemistry and transport modeling system. It is used to provide 48-hour forecasts of ozone concentrations over the United States. Meteorological forecast is conducted using the MM5 model. The regional chemistry and transport model simulates the sources, transport, chemistry, and deposition of 24 chemical tracers. Several tuning parameters in RAQAST are unknown. We select a set of five tuning parameters that are likely to strongly affect ozone forecasts: emissions of isoprene, NOx, boundary layer height (through diffusion), effects of clouds on photolysis, and anthropogenic VOC.

Using a design of 100 runs of RAQAST for a week in July 2005, we carry out the statistical calibration. Calibration consists of finding the values of the parameters that provide the best match with observations. The technique is Bayesian, involves Monte Carlo Markov Chains and relies on interpolations of computer experiments through a Gaussian Stochastic process over the space of parameters. Uncertainties are naturally assessed in the study.

Cari Kaufman SAMSI [email protected]

“Efficient Emulators of Computer Experiments Using Covariance Tapering”

414

Building emulators of computer code using a Gaussian process model can be computationally infeasible when the number of evaluated input values is large. We consider the use of covariance tapering, in which the original correlation function is multiplied with a correlation with compact support. This produces sparse covariance matrices which can be more easily manipulated. Covariance tapering has been applied to isotropic correlation functions in the context of spatial modelling; here we extend the method to correlation functions whose form is a product of correlations along each input dimension.

Herbie Lee University of California-Santa Cruz Department of Applied Math & Statistics [email protected]

“Global Optimization Using Pattern Search and Treed Gaussian Processes”

Optimization for complex computer simulations can be difficult when the code does not provide any gradient information with the runs, necessitating the use of a derivative-free optimization method. Asynchronous Parallel Pattern Search (APPS) is such a method that also allows use of an external ‘oracle’ to help guide the optimization search. Here we incorporate statistical modeling via Treed Gaussian Processes. Using a Bayesian statistical model allows us to explicitly quantify our uncertainty about the function output, leading to a more robust global optimization. We also use statistical inference in guiding convergence criteria, so as to allow globally informed stopping rules.

Hugo Maruri-Aguilar London School of Economics Department of Statistics [email protected]

“Generalized Aberration for Space Filling Designs”

The aberration of a design measures its capability to identify models of a lower (weighted) degree. In a precise algebraic sense, this measure is related to the spread of the design points, and although it is defined for polynomial models, it is not restricted to them. We present recent work and aberration results for latin hypercube designs and Sobol sequences.

Rui Paulo ISEG, Technical University of Lisbon Department of Mathematics [email protected]

“An Alternative Approach to Modelling the Bias Term”

The Kennedy and O'Hagan (2001) paper models the discrepancy between the output of the computer model and reality by introducing the so-called bias function, an additive error component to the computer model. Another approach, pioneered by Tomassini et al. (2007), in an effort to identify not only the presence of the bias but also its origin, investigates the possibility of certain parameters of the computer model being in effect a function of inputs, as opposed to fixed quantities. We follow this approach but linearize the ensuing computer model

415

formulation, a strategy that results in an additive bias term in the statistical model with a particular structure. This formulation of the bias term involves derivatives of the computer model output, and hence can also be perceived as a way of incorporating derivative information, which is sometimes available, into the validation problem. We report on our findings, both on the methodological and on the computational side, by presenting the results of applying these statistical models to a toy problem.

Peter Reichart EAWAG Department of [email protected]

“Analysing Input and Structural Uncertainty of a Hydrological Model with Stochastic Time- Dependent Parameters”

A recently developed technique for identifying continuous-time, time-dependent, stochastic parameters of dynamic models is put into a framework for identifying the causes of bias in model results and applied to a simple hydrological model. In this technique, state estimation of the time-dependent model parameter is combined with the estimation of (constant) model parameters and parameters of the stochastic process underlying the time-dependent parameter. In the framework for identifying and correcting model deficits, this technique is sequentially applied to the model parameters and the degree of bias reduction of model outputs is analyzed for each parameter. In a next step, the identified time dependence of the parameter is analyzed for correlation with external influence factors and model states. If significant relationships between the time-dependent parameter and influence factors or states can be derived, the deterministic model must be improved. Otherwise, or after improving the deterministic model in a first step, the description of uncertainty in model predictions can be improved by considering stochastic model parameters. The application of this framework to a simple 7-parameter conceptual hydrological model demonstrates its capabilities. The parameters (including additional parameters for input modification) have significantly different potential for bias reduction. Some dependences of time dependent parameters on internal model states can be found and are used to improve the deterministic model. However, the major contribution to the residuals of the original model cannot be associated with deterministic relationships. Therefore, a stochastic parameter must be introduced to get a description of the data that does not violate the statistic model assumptions. Because of the high uncertainty in precipitation within the watershed (total amount and spatial distribution), we use this parameter for input uncertainty characterization.

Gentry White North Carolina State University and SAMSI Department of Statistics [email protected]

“Bayesian Kalman Filter for Emulation of Complex Engineering Computer Models”

Complex engineering computer models based on finite element analysis (FEM) or non-linear models are often designed to simulate complex dynamic systems. These models can expensive to run, thus yielding little substantive knowledge on the overall behavior of the system. In these cases it is desirable to construct a simpler computer model called an emulator to explore system

416 behavior quickly. A good emulator should include the relevant physics for the system and perfectly interpolate the observed data and provide reasonable estimates of the complex model output when there are no observations. In the case of modeling dynamic systems it is natural to apply a state-space model, using a Kalman filter in the context of Bayesian estimation in order to construct an emulator. This model can be shown to be a Gaussian Process and as a result should have good properties as an emulator.

Henry Wynn London School of Economics Department of Statistics [email protected]

“Generalized Aberration for Space Filling Designs”

The aberration of a design measures its capability to identify models of a lower (weighted) degree. In a precise algebraic sense, this measure is related to the spread of the design points, and although it is defined for polynomial models, it is not restricted to them. We present recent work and aberration results for latin hypercube designs and Sobol sequences.

(Joint work Henry Wynn-Hugo Maruri)

F. Terrestrial Mid-Program Workshop April 4, 2007

Wednesday – April 4, 2007 NISS Building, Room 104

Workshop Focus: Subgrid processes that will control forest dynamics responses to climate change.

8:00-8:30 AM Registration and Continental Breakfast

SESSION I: The developing models - The thrust here is local scale processes that control forest dynamics and respond to regional climate change. These talks will review what has been accomplished by the SAMSI working group. How is climate being downscaled, and how is it being incorporated into to the stand simulator?

8:30-9:00 AM “Overview on Modeling Climate Heterogeneity and its Consequences for Forest Diversity” Jim Clark, Duke University Michael Dietze, Harvard University

9:00-10:00 AM “State-Space Modelling of Soil-Moisture” Jonathan Rougier, University of Bristol

10:00-10:30 AM “Progress Toward Interpolating Stochastic Model Output” Jim Crooks, SAMSI

417

10:30-11:00 AM Break

SESSION II: Participant research overviews - What questions are being asked at other sites, for the related processes, and with related tools that require understanding of downscaled climate data and/or model output?

11:00-11:15 AM “Predicting the Dynamics of Caribbean Forests under Global Climate Change: Interactions of Hurricane Damage and Human Land Use” Maria Uriarte, Columbia University

11:15-11:30 AM “The East Slope of the Tropical Andes and the Adjacent Amazon Harbor Earth’s Highest Biodiversity” Ken Feeley, Harvard University Miles Silman, Wake Forest University

11:30-11:45 AM “Applying Baysian Inference to Quantify the Uncertainties Associated with a Parsimonious Conceptual Hydrological Model” Wei Wu, Duke University

11:45-12:00 PM “Connections to Wireless Technology (Radiation)” Beniot Courbaud, CEMAGREF and Duke University

12:00-12:15 PM Connections to Wireless Technology (H2O) Alan Gelfand, Duke University

12:15-1:30 PM Lunch

DISCUSSIONS - These issues can be collapsed into a single extended discussion. The principal goals here are to determine if the modeling directions taken thus far will address the broadest range of forest dynamic questions and to plan for the next stage of potential collaborations.

1:30 PM Climate/Model Data Needs Moderator: Steve Sain, NCAR

Inference Issues Moderator: Alan Gelfand, Duke University

New Developments in Algorithms Moderator: Pankaj Agarwal, Duke University

Research Directions Moderators: Dave Bell, Duke University Carl Salk, Duke University

418

6:00 PM Adjourn

Speaker Abstracts

James S. Clark Duke University Nicholas School of the Environment and Earth Sciences [email protected]

“Overview on Modeling Climate Heterogeneity and its Consequences for Forest Diversity”

I will summarize the modeling approach used to evaluate forest dynamic responses to climate change, emphasizing activities of the working group during this semester. Topics will include overall structure for the inferential model, used to estimate parameters, the stand simulator that uses those estimates for prediction, and ways in which heterogeneity in climate enters the model and affects model behavior.

Benoit Courbaud CEMAGREF and Duke University [email protected]

“Connections to Wireless Technology (Radiation)”

In the SLIP model, adult tree and sapling growth are directly related to individual radiation interception. Trees also influence the distribution of radiation available for their neighbours. The distribution of foliage and the distribution of radiation in the canopy are key drivers of forest dynamics.

Objective: The wireless sensor network measures global radiation along time with different sensors located in the canopy at two sites (Eno and Blackwood) and above canopy at one site (Eno). The objective of this work is to predict radiation at any point and infer the distribution of foliage in the canopy.

Method: Forest canopy is devised in cubic cells with two possible states (occupied or not). Global radiation above canopy is devised in direct (coming from sun direction) and diffuse (coming from sky hemisphere). Sun location is calculated for every time step, as well as the list of cells crossed by the direct ray targeting every sensor. A Bayesian analysis is conducted to infer the coefficient of radiation interception by an occupied cell, the gap fraction above every sensor and the probability of occupancy of every canopy cell around sensors.

Results: Results are compared to the classical approach which consists in mapping every tree in the forest and reconstituting canopy based on geometrical crown shapes calculated from tree diameter.

James Crooks SAMSI [email protected]

419

“Progress Toward Interpolating Stochastic Model Output”

Deterministic computer model output may be interpolated across the input space using, for example, the GASP methodology. However, if the output is stochastic a different approach may be required, one that combines CDF estimation with standard interpolation. I will present recent work toward this goal using a combination of Dirichlet processes and copulas.

Jonathan Rougier SAMSI and University of Bristol Department of Mathematics [email protected]

“State-Space Modelling of Soil-Moisture”

We construct a deterministic model of the temporal evolution of the soil-moisture field on a plot, subject to climatic forcing and topography. This model has a relatively small number of parameters (fewer than ten) and can operate at many different resolutions, both spatially and temporally. We implement our model for the Coweeta plot located in the southern Appalachian Mountains in North Carolina. We will calibrate the model parameters against proxy data, using a stochastic generalisation and ‘likelihood-free’ sequential methods.

Joint work with Cari Kaufman, SAMSI.

Miles Silman Ken Feeley Wake Forest University Harvard University Department of Biology Center for Tropical Forest Science [email protected] [email protected]

“The East Slope of the Tropical Andes and the Adjacent Amazon Harbor Earth’s Highest Biodiversity”

Large gradients in temperature, precipitation, and soils are found over short distances, making it an ideal laboratory for studying species ranges, but frustrating efforts to make all but the simplest predictions about future ranges based on scenarios of climate change. This talk will present an overview of forest research in SE Peru, focusing on a network of plots extending from Amazonian lowlands to Andean treeline across a ~17 deg C temperature gradient. Rainfall varies from 1600mm yr-1 to 5000mm yr-1 in the lowland plots, and from 800mm yr-1 to 8000mm yr-1 in the Andean plots. Discussion will also include the relationship between tropical forest biomass and C storage, and predicting its response to climate change.

Maria Uriarte Columbia University Department of Ecology [email protected]

“Predicting the Dynamics of Caribbean Forests under Global Climate Change: Interactions of Hurricane Damage and Human Land Use”

420

Since 1995 the North Atlantic has had many active hurricane seasons, including the 2005 record breaking season. This high-level of activity has been related to multi-decadal variability in the Atlantic Ocean Evidence of a trend in the intensity of Atlantic hurricanes in the past 30 years has also been shown This global increase of sea surface temperature is primarily considered to be due to anthropogenic factors. There is still much controversy over this issue. Some studies attribute the trend in tropical cyclone activity to more accurately monitored measurements in the recent years Due to these many factors the detection of a trend on hurricane activity is very difficult, but could be very important.

Human land use is arguably the most pressing environmental issue in tropical areas worldwide, including many of the regions subject to intense tropical storms. The linkage between physical processes and human activity plays a critical role in the maintenance of sustainable terrestrial ecosystems, but the structure of this linkage is poorly characterized and rarely quantified. The systemic, impulsive perturbations caused by hurricanes afford one of the few observational opportunities to examine the time-dependent, cross-cutting relationships among climate, ecosystems, and land-use.

I will present results of our modeling efforts to (1) provide a state-of-the-art forest simulator that integrates the recently observed increase in hurricane activity and resulting changes in precipitation and wind with human effects on land use cover and (2) test critical aspects of this model in Puerto Rico using both existing land cover data and vegetation plots spread across a range of storm exposure and land cover classes. Puerto Rico is an ideal case to test the model because economic shifts in the last few decades have led to the abandonment of agriculture in favor of manufacturing with concomitant increases in urban development. This change has created a range of vegetation types of various ages in which we can study the process of vegetation transition and recovery from hurricane damage and human land use. An important outcome of this work is that it will examine the relationships between hurricanes, climate change human activity, and ecosystems.

Wei Wu Duke University Nicholas School of the Environment and Earth Sciences [email protected]

“Applying Baysian Inference to Quantify the Uncertainties Associated with a Parsimonious Conceptual Hydrological Model”

Hydrologists shift their focus from seeking optimization strategies for identifying a single best model to trying to reduce the uncertainties in the predictions of the models. This is due to not only that there usually does not exist a single best combination of parameters to optimize the behavior of a model, but also that hydrologists realize that consistency is more important than optimality. Many hydrological models are used to analyze current systems, forecast into future under the scenarios of climate and land use changes and, therefore assist sustainable management of water resources. It is key for the policy makers to be aware of the uncertainties associated with the models. Research to date has focused mainly on quantifying parameter and data uncertainties, however model structure error can be the most significant component of the overall predictive uncertainty.

421

This paper demonstrated a case study of applying data assimilation in Bayesian framework to determine the uncertainties associated with model structure as well as data and parameters. Compared to many ad hoc methods, Bayesian has the advantages of easy interpretation, being consistent and having learning ability. The hydrological model we studied is GR4J -- a daily lumped rainfall- runoff model with four parameters. We revised the model to include snow component. We applied this model to two control watersheds in Coweeta Basin in southwestern North Carolina. We included TDR measurements in the model or not in the data assimilation to compare their differences.

D. HIGH DIMENSIONAL INFERENCE AND RANDOM MATRICES

A. Opening Workshop and Tutorials September 17-20, 2006

Sunday, September 17, 2006 Radisson Hotel Room H, 3rd Floor

8:00-8:50 AM Registration and Continental Breakfast

8:50-9:00 AM Welcome: Nell Sedransk, SAMSI and NISS

Tutorials: 9:00-10:00 AM Tutorial Lecture #1: Ofer Zeitouni, University of Minnesota

10:00-10:30 AM Coffee Break

10:30-12:30 PM Tutorial Lecture #2: Craig Tracy, University of California-Davis

12:30-1:30 PM Lunch

1:30-2:30 PM Tutorial Lecture #3: Ofer Zeitouni, University of Minnesota

2:30-3:00 PM Break

Overview Talks: 3:00-4:00 PM Roland Speicher, Queen's University, "Random Matrices and Free Probability Theory" 4:00-5:00 PM Alan Edelman, Massachusetts Institute of Technology, “Applied Stochastic Eigenanalysis”

5:00-5:30 PM Poster Presentation Session (2 minutes each)

6:30-8:30 PM Poster Session and Reception

422

Room DE, 2nd Floor (Poster presenters please arrive early – posters set-up by 6:15pm)

Monday, September 18, 2006 Radisson Hotel Room H, 3rd Floor

8:00-8:15 AM Registration and Continental Breakfast

8:15-8:30 AM Welcome to SAMSI and Opening of Workshop: Chris Jones, SAMSI and University of North Carolina Iain Johnstone, Stanford University

Applications of Random Matrices: 8:30-9:15 AM David Hoyle, University of Manchester “Large Random Matrices in Bioinformatics and Molecular Biology” 9:15-10:00 AM Gaby Hegerl, Duke University “Climate Change Detection and Attribution: An Example of a Large Small-Sample Problem”

10:00-10:30 AM Coffee Break

10:30-11:15 AM Thomas Marzetta, Bell Labs “A Surprising Random Matrix Result, Wireless Communications, and the Grassmann Manifold” 11:15-Noon Discussion Panel: Doug Nychka, National Center for Atmospheric Research Thomas Guhr, University of Lund

Noon-1:30 PM Lunch

Foundations of Random Matrix Theory: 1:30-2:15 PM Harold Widom, University of California-Santa Cruz “Correlation Functions for Orthogonal Polynomial Random Matrix Ensembles” 2:15-3:00 PM Estelle Basor, California Polytechnic State University “Computations of Linear Statistics for Ensembles of Random Matrices” 3:00-3:45 PM Discussion Panel: Nicholas Ercolani, University of Arizona Peter Miller, University of Michigan Ofer Zeitouni, University of Minnesota

3:45-4:30 PM Break

4:30-5:30 PM SAMSI Distinguished Lecture: David Donoho, Stanford University

423

Anne T. and Robert M. Bass Professor of Humanities and Sciences and Professor of Statistics “The Breakdown Point of Model Selection When There Are More Variables than Observations” 5:30-6:30 PM Reception Room FG, 3rd Floor

Tuesday, September 19, 2006 Radisson Hotel Room H, 3rd Floor

8:00-8:30 AM Registration and Continental Breakfast

Inference and Regularization: 8:30-9:15 AM Iain Johnstone, Stanford University “On Extreme Eigenvalues and Eigenvectors for Large Covariance Matrices” 9:15-10:00 AM Liza Levina, University of Michigan “Estimation of Large Covariance Matrices”

10:00-10:30 AM Coffee Break

10:30-11:15 AM John Lafferty, Carnegie Mellon University “Sparsity in High Dimensional Regression, Density Estimation, and Graph Inference” 11:15-Noon Discussion Panel: Peter Bickel, University of California-Berkeley Mohsen Pourahmadi, Northern Illinois University Bin Yu, University of California-Berkeley

Noon-1:30 PM Lunch

New Researcher Session: 1:30-1:45 PM Boaz Nadler, Weizmann Institute of Science “Finite Sample Results on the Convergence of PCA for Spiked Covariance Models” 1:45-2:00 PM Leonard Choup, University of California-Davis “Edgeworth Type Expansion of the Distribution of the Largest Eigenvalue in Classical Random Matrix Ensembles” 2:00-2:15 PM Sandrine Peché, University of Grenoble “The Largest Eigenvalue of Some Ensembles of Random Matrices” 2:15-3:00 PM Further talks by new researchers TBA

3:00-3:30 PM Break

Multivariate Statistical Issues: 3:30-4:15 PM Helene Massam, York University “The Wishart, Some Related Distributions and Some Moments”

424

4:15-5:00 PM Steven Smith, Massachusetts Institute of Technology – Lincoln Lab “Perspectives on Intrinsic Estimation with Applications to Covariance Matrices and Signal Processing” 5:00-5:45 PM Discussion Panel: Ya'acov Ritov, Hebrew University Donald Richards, Pennsylvania State University

Wednesday, September 20, 2006 Radisson Hotel Room H, 3rd Floor

8:00-8:30 AM Registration and Continental Breakfast

Computations and Mathematical Issues: 8:30-9:15 AM Ioana Dumitriu, University of Washington-Seattle “Beta Ensembles: A Brief Survey” 9:15-10:15 AM James Mingo, Queen's University Raj Rao, Massachusetts Institute of Technology “Statistical Eigen-Inference From Large Wishart Random Matrices”

10:15-10:30 AM Coffee Break

10:30-Noon Closing Discussion Panel: Plamen Koev, Massachusetts Institute of Technology Jianqing Fan, Princeton University Brian Rider, University of Colorado-Boulder Friedrich Götze, Bielefeld University

Noon-1:30 PM Lunch

1:30-2:45 PM Working Group Formation

2:45-3:00 PM Break

3:00-4:00 PM Working Group Reports

Speaker Abstracts

Estelle Basor California Polytechnic State University Department of Mathematics [email protected]

“Computations of Linear Statistics for Ensembles of Random Matrices”

This talk will describe the connection between the classical Strong Szegö Limit Theorem and linear statistics for ensembles of some classes of random matrices. Such linear statistics are of 425

the form $\sum_{i=1}^{n}f(\lambda_{i})$ where $\lambda_{1}, \dots \lambda_{n}$ are the eigenvalues of the random matrix and f is some suitable function. The Szegö Theorem gives a quick, direct way to find asymptotic information about the linear statistics. The choice of conditions on the function f leads to some interesting extensions and generalizations of the classical Szegö Theorem.

Florentina Bunea Florida State University Department of Statistics [email protected]

No Title or Abstract Submitted

Leonard Choup University of California, Davis Department of Mathematics [email protected]

“Edgeworth Type Expansion of the distribution of the Largest Eigenvalue in Classical Random Matrix Ensembles”

We derive an edge scaling correction for Gaussian and Laguerre kernels for Unitary Ensembles, β use it to write an expansion of FN;2(t) = Pβ(λ max ≤ t) in terms of N in the first part, and give an outline of the steps needed to extend this analysis to the orthogonal and simplistic cases.

Distinguished Lecture David Donoho Stanford University Department of Statistics [email protected]

“The Breakdown Point of Model Selection When There Are More Variables Than Observations”

Ioana Dumitriu University of Washington Department of Mathematics [email protected]

“Beta Ensembles: A Brief Survey”

Beta ensembles are generalizations of the well known Gaussian (Hermite), Wishart (Laguerre), and MANOVA (Jacobi) models of classical random matrix theory, with the general (positive) parameter $\beta$ acting as an inverse temperature (eigenvalue repulsion strength). The beta ensembles provide a tool for a unified approach to the study of eigenstatistics for the real, complex, and quaternion cases ($\beta = 1,2,4$). This talk will present a brief survey of results for general $\beta$, and provide a glimpse of what happens behind Dyson's “threefold way.”

Alan Edelman Massachusetts Institute of Technology & Interactive Supercomputing

426

Department of Mathematics [email protected]

“Applied Stochastic Eigenanalysis”

In this talk, I will survey the computational and numerical progress in the areas that are known as random matrix theory or Applied Stochastic Eigenanalysis.

Gabriele Hegerl Duke University Nicholas School of the Environment [email protected]

“Climate change detection and attribution: an example of a large small-sample problem”

Research aimed at the detection of climate change and at estimating the contribution to recent warming from increases in greenhouse gases often applies multiple regression techniques using a best linear unbiased estimator. The problem involves separating the contribution of a “fingerprint” for the climate response to greenhouse gas increases to one sample of observed change (such as the global space-time-pattern of temperature increase in the recent 50 years) from that due to internal climate variability and the climate response to changes in other external influences on climate, such as solar radiation, aerosols, and the response to volcanic eruptions.

The problem is spatially large (thousands of stations or gridboxes representing temperature changes, over 50 years). Nevertheless, an estimate of the space-time covariance of climate noise is required, and its dimension needs to be reduced in order to arrive at a robust estimate of the noise covariance (whose inverse is needed in the best linear unbiased multiregression), which is typically estimated from hundreds of years of simulations with climate models. Techniques for data compression such as empirical orthogonal functions (eofs) or principal component analysis are used for this purpose, and methods are available that give some indication if the size of the problem is manageable after dimension reduction.

The eof analysis results are not only useful for dimension reduction, but also provide interesting information by itself on patterns of global temperature variability. A challenge is to compare model-simulated eofs or pcs with those estimated from different models or observations and assess if there are significant differences and how these can be best characterized. Also, the physical interpretation of the patterns of variability provides some pitfalls, of which examples are given.

David Hoyle University of Manchester NIBHI, School of Medicine [email protected] http://personalpages.manchester.ac.uk/staff/david.c.hoyle

“Large Random Matrices in Bioinformatics and Molecular Biology”

Bioinformatics is the application of Mathematics, Statistics and Computer Science to the study of biological problems and data. Over the last 5 years or so the nature of available biological data

427

has changed radically, with it now being standard to simultaneously study the activity levels of 1000s of genes within an organism. Consequently bioinformatic data sets are of very high dimension, but typically only consist of a small number of sample points. Covariance based statistical learning algorithms such as PCA are commonly applied to these data sets to elucidate patterns of similarity.

The study of large sample covariance matrices constructed from high-dimensional data and relatively small sample sizes becomes an interesting issue in its own right, but also important in helping us to understand the accuracy and limitations of the learning algorithms employed and therefore the validity of the conclusions drawn from the data. Analysis techniques borrowed from statistical physics can provide the required insight into the behaviour of sample covariance matrices in this regime.

In this talk I will give a brief introduction to the types of data bioinformaticians study and the statistical physics techniques used for studying large sample covariance matrices. I will also show how the results of Random Matrix Theory can used to understand and improve the accuracy of Bayesian model selection criteria for PCA.

The whole genome approach of modern molecular biology means that large random matrices are widespread not just within primary measurements, but also in any secondary, derived or auxiliary data. I will attempt to outline areas of molecular biology and bioinformatics where large random matrices also occur, but to which the techniques of Random Matrix Theory have yet to be extensively applied.

Iain Johnstone Stanford University Department of Statistics [email protected]

“On extreme eigenvalues and eigenvectors for large covariance matrices”

John Lafferty Carnegie Mellon University Department of Computer Science [email protected]

“Sparsity in High Dimensional Regression, Density Estimation, and Graph Inference”

We present a method for simultaneously performing bandwidth selection and variable selection in nonparametric regression and density estimation. Our method starts with a local linear or kernel estimator with large bandwidths, and incrementally decreases the bandwidth in directions where the gradient of the estimator with respect to bandwidth is large. When the unknown function satisfies a sparsity condition, the approach avoids the curse of dimensionality. The method, called rodeo (regularization of derivative expectation operator), conducts a sequence of hypothesis tests, and is easy to implement. A modified version that replaces testing with soft thresholding may be viewed as solving a sequence of lasso problems. In high dimensions the rodeo attains near optimal minimax rates of convergence, as if the relevant variables were known in advance and isolated.

428

Time permitting, we also discuss recent results on using l1 regularization for inferring the structure of sparse undirected graphical models in the high-dimensional setting. With the number of nodes and maximum neighborhood sizes allowed to grow as a function of the number of observations, we show that logarithmic growth in the number of samples relative to graph size is sufficient to achieve consistency of the estimated graph structure.

Joint work with Larry Wasserman, Han Liu, Pradeep Ravikumar and Martin Wainwright.

Elizaveta Levina University of Michigan Department of Statistics [email protected]

“Estimation of Large Covariance Matrices”

Estimation of covariance matrices has a number of important applications, which include principal component analysis, classification by linear or quadratic discriminant analysis, and inferring independence and conditional independence between variables. It has long been known that the sample covariance matrix has many undesirable features in high dimensions, and many alternatives have been proposed. This talk will first briefly review different types of regularized covariance estimators and then focus on regularization by “banding” either the covariance matrix itself or the Cholesky factor of its inverse. These estimators are shown to be consistent when both data dimension and sample size go to infinity at certain rates. A resampling scheme for selection of tuning parameters is developed. A generalization of this approach is to replace banding with more flexible regularization by use of appropriate penalties. Lasso has been used for this purpose, and a new hierarchical lasso penalty will be introduced, which performs better than regular lasso or banding while also forcing a sparse structure.

Thomas Marzetta Bell Laboratories and Lucent Technologies Communications and Statistical Sciences [email protected]

“A Surprising Random Matrix Result, Wireless Communications, and the Grassman Manifold”

The multiplication of a deterministic unit-vector by an isotropically-random (Haar measure) unitary matrix results in an isotropically-random unit vector which has a uniformly random direction. A second multiplication by the same matrix yields a unit vector which has a preferred direction. In short, the square of an isotropically-random unitary matrix is not isotropically- random. We rediscovered this remarkable property – originally found by Rains – while attempting to design structured random constellations of space-time signals for multiple-antenna wireless communications in the case where nobody knows the channel. One transmits a space- time matrix which is corrupted by both multiplicative and additive noise before it is received. The message bits are encoded on the subspace that is spanned by the orthonormal columns of the transmitted space-time signal (a Grassman manifold), and this subspace survives the multiplication of the signal by an unknown channel-matrix.

Helene Massam York University

429

Department of Mathematics and Statistics [email protected]

“The Wishart, some related distributions and some moments”

We will recall various types of moments, obtained in recent years, for the real and complex Wishart distribution and we will look at the tools used to obtain them. We will then recall some distributions related to the Wishart and defined in the context of graphical models. For these distributions, very few moments are available. A more systematic study of their moments is yet to be done.

James A. Mingo Raj Rao Queen's University Massachussetts Institute of Technology Department of Mathematics and Statistics Department of Electrical Engineering and [email protected] Computer Science [email protected]

“Statistical Eigen-Inference from Large Wishart Random Matrices”

The asymptotic behavior of the eigenvalues of a sample covariance matrix is described when the observations are from a zero mean multivariate (real or complex) normal distribution whose covariance matrix has population eigenvalues of arbitrary multiplicity. In particular, the asymptotic normality of the fluctuation in the trace of powers of the sample covariance matrix from the limiting quantities is shown. Concrete algorithms for computing the limiting quantities and the covariance of the fluctuations are presented using the framework of free probability and second order freeness, respectively.

Tests of hypotheses for the population eigenvalues are developed and a technique for inferring the population eigenvalues is proposed that exploits this asymptotic normality of the trace of powers of the sample covariance matrix. Monte-Carlo simulations are used to demonstrate the superiority of the proposed methodologies over classical techniques and the robustness of the proposed techniques in high-dimensional, (relatively) small sample size settings.

The results are obtained by a new construction called second order freeness. Second order freeness extends to fluctuations of random matrices what Voiculescu’s first order freeness did for the problem of calculating the eigenvalue distributions of sums and products of random matrices. We will sketch the main ideas of second order freeness and present our second order analogue of the R-transform, which allows us to effectively calculate the fluctuations of A+B from the fluctuations of A and B.

This is joint work with Alan Edelman and Roland Speicher.

Boaz Nadler Weizmann Institute of Science Department of Computer Science and Applied Mathematics [email protected]

“Finite Sample results on the convergence of PCA for spiked covariance models”

430

In this paper we present three main results regarding the convergence of Principal Component Analysis (PCA) for spiked population models. The first is a finite sample result on the convergence of PCA for a spiked covariance model with a single component. We show that for finite dimensionality p and sample size n, under certain conditions and with probability 1- epsilon, the top PCA eigenvector and eigenvalue are close to those of population PCA. Secondly, using a perturbation method we derive explicit expressions for the leading order terms in the eigenvalues and eigenvectors of PCA. We show that for finite p,n there is no sharp phase transition as in the infinite case. However, either as a function of noise level or number of samples there can be a sudden sharp "loss of tracking". This occurs due to a crossover between the eigenvalue of the signal to the largest eigenvalue due to noise. Finally, we show how our method can be used to derive the asymptotic pulled up values for the eigenvalues in the joint limit, $p,n\to\infty$.

Sandrine Peche University of Grenoble 1 Department of Mathematics [email protected]

“The largest eigenvalue of some ensembles of random matrices”

We consider Deformed Wigner matrices $H_N=\frac{1}{\sqrt N}H + W_N$ where $H$ is a standard Hermitian random matrix and $W_N$ is a fixed rank deterministic matrix. We prove under mild assumptions that the largest eigenvalue of such random matrices has a universal limiting behavior as $N \to \infty$, for some specific deformation W_N$. We also say a word about analoguous results for complex random sample covariance matrices.

Steven Smith Massachusetts Institute of Technology Lincoln Laboratory [email protected]

“Perspectives on Intrinsic Estimation with Applications to Covariance Matrices and Signal Processing”

The intrinsic nature of estimation theory is fundamentally important, a fact that was realized very early in the field, as evidenced by Rao's first and seminal paper on the subject. Yet in spite of its fundamental importance, intrinsic analysis in statistics, specifically in estimation theory, has in fact received relatively little attention, notwithstanding important contributions from Efron, Amari, Oller and colleagues, Hendriks, and others. This talk presents and compares results from several different perspectives on this subject, and applies these results to well-known problems in covariance matrix estimation and signal processing. Specifically, the covariance estimation is posed as an intrinsic estimation problem on the space of positive definite matrices, which has the structure of a homogeneous or quotient space, not a vector space -- the necessary setting for classical Cramer-Rao bounds. Covariance matrix estimation accuracy bounds are derived from an intrinsic derivation of the Cramer-Rao bound on arbitrary Riemannian manifolds, and compared to the accuracy achieved by standard methods involving the sample covariance matrix (SCM). Estimator efficiency is discussed from different, novel, viewpoints. Remarkably, it is shown that that from an intrinsic perspective, the SCM is a biased and inefficient estimator; the bias corresponds to the SCM's poor estimation quality at low sample support -- this contradicts

431 the well-known fact that E[SCM] = R because the linear expectation operator implicitly treats the covariance matrices as points in a real vector space, compared to the intrinsic treatment of positive-definite Hermitian matrices used in this talk. The accuracy bound on unbiased covariance matrix estimators is shown to be about (10/log10)*n/sqrt(K) decibels, where n is the matrix order and K is the sample support. Thus a connection is established between estimation loss for covariance matrices, and the well-known Reed-Mallett-Brennan detection loss in adaptive filtering problems. The analysis approach developed is directly applicable to many other estimation problems on manifolds encountered in signal processing and elsewhere, such as estimating rotation matrices in computer vision and estimating subspace basis vectors in blind source separation. Finally, several intriguing open questions pertaining to the possibility of improved covariance matrix estimation methods at low sample support are presented.

Roland Speicher Queen's University Department of Mathematics and Statistics [email protected]

“Random Matrices and Free Probability Theory”

Assume you are given two symmetric random matrices A and B and you know the eigenvalue distribution of A and the eigenvalue distribution of B. What can you say about the eigenvalue distribution of A+B or of AB? In general, not much since the relation between the eigenspaces of A and the eigenspaces of B is relevant. However, in typical cases and when the size of the matrices tends to infinity the eigenspaces of A and B are in generic position and this problem has a deterministic solution; there are precise formulas for calculating the eigenvalue distribution of A+B or of AB from the eigenvalue distribution of A and the eigenvalue distribution of B. These formulas (and much more) are provided by Voiculescu’s theory of free probability. In this overview talk I will give an idea what free probability is and how it relates to random matrices. In particular, I want to show how tools from free probability (like R-transform or S-transform) provide the answers to the above mentioned problems.

Harold Widom University of California-Santa Cruz Department of Mathematics [email protected]

“Correlation functions for orthogonal polynomial random matrix ensembles”

We shall derive formulas for correlation functions and spacing distributions for the unitary orthogonal polynomial random matrix ensembles in terms of integral operators and their kernels. We briefly explain how matrix kernels arise in the analogous formulas for the orthogonal and symplectic ensembles.

Discussion Panel Information

Applications of Random Matrices Discussion Panel Monday-September 18, 2006 11:15-12:00 PM

432

Douglas Nychka National Center for Atmospheric Research IMAGe [email protected]

Thomas Guhr Lunds Universitet Department of Matematisk Fysik, LTH [email protected]

“Some Research Directions of RMT in Physic”

I will sketch some of the present RMT research in physics, restricting myself to those topics which I consider as particularly important. The list includes embedded ensembles, non-invariant ensembles, applications in quantum chromodynamics, and various aspects of supersymmetry and the recovery of the replica trick.

Discussion Panel: • Nicholas Ercolani, University of Arizona • Peter Miller, University of Michigan • Ofer Zeitouni, University of Minnesota

Discussion Panel: • Peter Bickel, University of California Berkeley • Mohsen Pourahmadi, Northern Illinois University • Bin Yu, University of California Berkeley

Discussion Panel: • Ya’acov Ritov, Hebrew University • Donald Richards, Pennsylvania State University

“Exact inference for multivariate normal populations with monotone incomplete data”

We consider problems in inference with monotone incomplete data drawn from a multivariate normal population. For the case in which the data have a “two-step” monotone missingness pattern, we obtain stochastic representations for the exact, finite- sample distributions of the maximum likelihood estimators of the population mean and covariance matrix. These stochastic representations will be shown to be crucial in developing an extensive theory of finite-sample inference for mutivariate normal populations with monotone incomplete samples.

Closing Discussion Panel: • Plamen Koev, Massachusetts Institute of Technology • Jianqing Fan, Princeton University • Brian Rider, University of Colorado at Boulder • Friedrich Götze, Bielefeld University

433

Poster Abstracts

Jeongyoun Ahn University of Georgia Department of Statistics [email protected]

“The High Dimension, Low Sample Size Geometric Representation Holds Under Mild Conditions”

The theory of large-dimension-fixed-sample-size asymptotics, and some mathematical statistical consequences, were explored by Hall, Marron and Neeman. They examined quantities such as pairwise distances between the data vectors and established the geometric representation of high dimension, low sample size data. A perhaps unappealing assumption of these results was that the entries of the data vectors were nearly independent. Fortunately, the main result of this paper, is the same geometric representation, under a very mild assumption on the population eigenvalues. This assumption uses a population version of the locally most powerful invariant test statistic for sphericity, and it is discussed in the context of the phase transition phenomenon.

Edo Airoldi Carnegie Mellon University School of Computer Science [email protected]

“Stochastic Block Models of Mixed Membership”

We consider the statistical analysis of a collection of unipartite graphs, i.e., multiple matrices of relations among objects of a single type. Such data arise, for example, in biological settings, collections of author-recipient email, and social networks. In such applications, typical analyses aim at: (i) clustering the objects of study or situating them in a low dimensional space, e.g., a simplex; and (ii) estimating relational structures among the clusters themselves. For example, in biological applications we are interested in estimating how stable protein complexes (i.e., clusters of proteins) interact. To support such integrated data analyses, we develop the family of stochastic block models of mixed membership. Our models combine features of mixed- membership models (Erosheva and Fienberg 2005) and block models for relational data (Holland et al. 1983) in a hierarchical Bayesian framework. We develop a nested variational inference scheme, which is necessary to successfully perform fast approximate posterior inference in our models of relational data. We present evidence to support our claims, using both real and synthetic data.

Keywords. hierarchical Bayesian models of mixed membership, mean-field approximation, na¨ıve variational inference, nested variational inference, statistical network analysis, social and biological networks.

Steven Baker University of Alabama-Birmingham Department of Applied Mathematics [email protected]

434

“Lifshitz Tails for Random Schrodinger Operators”

In solid state physics, solutions of the Schrodinger equation are used to model wave propagation in disordered media such as alloys, glass, and amorphous materials. It is well known that understanding the behavior of these solutions depends on an analysis of the spectrum of the associated Hamiltonian. For media containing impurities, the Hamiltonian is modeled by a differential operator where the potential energy exhibits certain randomness which encodes the disorder. In this presentation, we discuss spectral (eigenvalue) properties of continuous and discretized forms of such Hamiltonians. When discretized, the Hamiltonian is modeled using random Jacobi matrices. In particular, we discuss computation of the spectral minimum and it’s importance in proving the phenomena of Lifshitz tails for the integrated density of states. The presence of Lifshitz tails indicates that the probability of observing extremal energies (or eigenvalues) is exponentially small. We also highlight recent discoveries for a particular random Schrodinger operator called the random displacement model.

Aiyou Chen Bell Labs and Lucent Technologies Communications and Statistical Sciences [email protected]

“Method of 1-D Projections for Network Tomography”

Ginger Davis University of Virginia Department of Systems and Information Engineering [email protected]

“Evolving Structure in Multivariate Time Series with Application to Stock Sector Data”

Financial data lends itself to multivariate analysis due to its hierarchical structure (e.g. individual securities within sectors within markets). Many models exist for the joint analysis of several financial instruments such as securities due to the fact that they are not independent. These models often assume some type of constant behavior between the instruments over the time period of analysis. Instead of imposing this assumption, we are interested in understanding the dynamic covariance structure in our multivariate financial time series, which will provide us with an understanding of changing market conditions. In order to achieve this understanding, we first develop a multivariate model for the conditional covariance and then examine that estimate for changing structure using multivariate techniques. Specifically, we simultaneously model individual stock data that belong to one of three market sectors and examine the behavior of the market as a whole as well as the behavior of the sectors. Our aims are detecting and forecasting unusual changes in the system, such as market collapses and outliers, and understanding the issue of portfolio diversification in multivariate financial series from different industry sectors.

Alan Edelman Brian Sutton Massachusetts Institute of Technology & Randolph-Macon College Interactive Supercomputing Department of Mathematics Department of Mathematics [email protected] [email protected]

435

“From Random Matrices to Stochastic Operators”

We propose that classical random matrix models are properly viewed as finite difference schemes for stochastic differential operators. Three particular stochastic operators commonly arise, each associated with a familiar class of local eigenvalue behavior. The stochastic Airy operator displays soft edge behavior, associated with the Airy kernel. The stochastic Bessel operator displays hard edge behavior, associated with the Bessel kernel. The article concludes with suggestions for a stochastic sine operator, which would display bulk behavior, associated with the sine kernel.

Nicholas Ercolani University of Arizona Department of Mathematics [email protected]

“Some Applications of the Large N Expansion for the Random Matrix One-Point Function”

Xiaoming Huo Georgia Institute of Technology Department of Industrial and Systems Engineering [email protected]

“Connect-the-dots”

Connect-the-dots is a class of problems. I will illustrate these problems, and describe connections with random matrix theory and high dimensional inference.

Plamen Koev Massachusetts Institute of Technology Department of Mathematics [email protected]

“Computing the Hypergeometric Function of a Matrix Argument”

Myung Hee Lee University of North Carolina-Chapel Hill Department of Statistics and Operations Research [email protected]

“Asymptotics of the Direction of Sample Maximal Covariance”

We consider paired multivariate vectors (x,y), where their joint distribution is a multivariate Gaussian. The singular value decomposition of the sample between covariance matrix gives the sample maximal covariance direction vectors. Under certain conditions on the covariance matrix, we show that the inner product between the true and sample maximal covariance vector has a limiting Chi-square distribution, as the dimension grows to infinity with the fixed sample size. This result implies the inconsistency of the sample maximal covariance vector with the theoretical counterpart in a strong sense.

436

Kwan Lee GlaxoSmithKline Statistical and Quantitative Sciences [email protected]

“Gene Association Networks and Partial Correlation”

Shayan Mukherjee Duke University Institute of Statistics and Decision Sciences [email protected]

“Learning Gradients and Feature Selection on Manifolds”

An underlying premise in the analysis and modeling of high-dimensional physical and biological systems is that data generated by measuring thousands of variables lie on or near a low- dimensional manifold. This premise has led to various estimation and learning problems grouped under the heading of ``manifold learning.'' It is natural to formulate the problem of feature selection -- finding salient variables (or linear combinations of salient variables) and estimating how they covary -- in the manifold setting. For regression and classification the idea of selecting features via estimates of the gradient of the regression and classification function has been developed. In this paper we extend this approach from the Euclidean setting to the manifold setting. This results in: a method for estimating gradients on a manifold, generalization bounds for this gradient estimate, and a novel variable selection or dimensionality reduction procedure. The utility of our approach is illustrated on simulated and real data.

Joint work with Qiang Wu, Ding-Xuan Zhou.

Tamer Oraby University of Cincinnati Department of Mathematical Sciences [email protected]

“The Limiting Spectra of Random Block-Matrices”

Random block-matrices are of a particular interest in some branches of science and some applications, e.g., physics, statistics, time series, image processing and wireless communication. In my research, I study the limiting spectral distribution of some patterns of random block- matrices. Interestingly, they turn out to be mixtures of probability distributions. In Oraby (2005), I analyze among other things the convergence of random block-matrices. In a recent joint paper Far et al. (2006) we use matrix-valued Cauchy transforms to compute the limiting spectra of certain block-matrices. This poster, which presents my results in Oraby (2006), describes a model for random block-matrices which generalizes a setting considered by Girko (2000). In this poster, I am going to present its limiting spectral distribution under general conditions, outline the proof, give some examples for this model and discuss some other cases.

Junyong Park Purdue University

437

Department of Statistics [email protected]

“Robust Test for Detecting a Signal in a High Dimensional Sparse Normal Vector”

Let $Z_i$, $i=1,...,n$, be independent random variables, $E(Z_i)=\mu_i$, and $Var(Z_i)=1$. We consider the problem of testing $H_0: \mu_i=0, i=1,...,n$. The setup is when $n$ is large, and the vector $(\mu_1,...,\mu_n)$ is `sparse', e.g., $\sum_{i=1}^n \mu_i2=o(\sqrt{n})$.

We suggest a test which is not sensitive to the exact tail behavior implied under normality assumptions. In particular, if the ‘moderate deviation’ tail of the distribution of $Z_i$, may be represented as the product of a tail of a standard normal and a ‘slowly changing’ function, our suggested test is robust. Such a tail behavior, and a need for such a robust test, is expected when the $Z_i$ are of the form $Z_i=\sum_{j=1}^m Y_{ij}/ \sqrt{m}$, for large $m$, $m<

Joint work with Eitan Greenshtein.

Ricky Rambharat Duke University Institute of Statistics and Decision Sciences [email protected]

“A Sequential Monte Carlo Method for Risk-Neutral American Option Pricing”

We introduce a new method to price American-style options on underlying investments governed by stochastic volatility models. The method combines a standard gridding approach to solving the associated dynamic programming problem, with a sequential Monte Carlo scheme to estimate required posterior distributions of the latent volatility process. The method represents a refinement of previous algorithms since it does not require the volatility process to be directly observable. Instead, the sequential Monte Carlo scheme provides accurate estimates of the required conditional distributions. Furthermore, the method incorporates market price of volatility risk, and is generalizable to handle different kinds of stochastic volatility models. We also demonstrate that with historical data for a few stocks, and appropriately chosen market price of volatility risk, the algorithm yields option prices which are highly consistent with market data.

Reza Rashidi Far Queen's University Department of Mathematics and Statistics [email protected]

“Spectra of Large Block Matrices”

Joint work with Roland Speicher, Wlodzimierz Bryc and Tamer Oraby.

Armin Schwartzman Stanford University Department of Statistics [email protected]

438

“A Log-normal Distribution for Positive Definite Matrices and Diffusion Tensor Imaging”

Sunder Sethuraman Iowa State University Department of Mathematics [email protected]

“Occupation Laws for Some Markov Time-reinforcement Schemes”

We consider finite-state time-nonhomogeneous Markov chains whose transition matrix at time n is I + G/n _ where G is a “generator” matrix, that is G(i, j) > 0 for i, j distinct, and G(i, i) = - P k 6 =i G(i, k), and _ > 0 is a parameter. In these chains, as time grows, the positions are less and less likely to change, and so form natural “reinforcement” schemes. Although it is shown, on the one hand, that the position at time n converges to a point-mixture for all _ > 0, on the other hand, the average position up to time n, when variously 0 < _ < 1, _ > 1 or _ = 1, is shown to converge to a constant, a point-mixture, or a distribution µG with no atoms and full support on a simplex respectively, as n " 1 . The last type of limit can be seen as a sort of “spreading” between the cases 0 < _ < 1 and _ > 1. In particular, when G is appropriately chosen, µG is a Dirichlet distribution with certain parameters, reminiscent of results in Polya urns.

Joint work with Zach Dietz (Tulane University).

Haipeng Shen University of North Carolina-Chapel Hill Department of Statistics and Operations Research [email protected]

“Sparse Principal Component Analysis via Regularized Low Rank Matrix Approximation”

Daniel Zantedeschi University of California-Santa Cruz Department of Applied Math and Statistics [email protected]

“Bayesian Statistical Models for Climate Model Output”

Lingsong Zhang University of North Carolina-Chapel Hill Department of Statistics and Operations Research [email protected]

“Functional Singular Value Decomposition and Recenterings”

Joint work with J. S. Marron, Haipeng Shen and Zhengyuan Zhu.

Ou Zhao University of Michigan Department of Statistics

439 [email protected]

“Law of the Iterated Logarithm for Stationary Processes”

B. Bayesian Focus Week October 30-November 3, 2006

Monday, October 30, 2006 Radisson Hotel RTP Room H, 3rd Floor

8:30-9:15 AM Registration and Continental Breakfast

9:15-9:30 AM Welcome Chris Jones, University of North Carolina at Chapel Hill Hélène Massam, York University

Covariance Estimation: 9:30-10:30 AM Mohsen Pourahmadi, Northern Illinois University “Searching for Ideal Priors (Reparametrization) for Covariance Matrices” 10:30-11:30 AM Dongchu Sun, University of Missouri “Objective Bayesian Analysis for the Normal Variance and Precision Matrices for Nonstandard Cases”

11:30-11:45 AM Break

Covariance Estimation: 11:45-12:45 PM Merlise Clyde, Duke University “Classification and Covariance Estimation via Bayesian Shrinkage”

12:45-2:00 PM Lunch

Graphical Models: 2:00-3:00 PM Nanny Wermuth, Chalmers University of Technology “Distortions of Effects” 3:00-4:00 PM Mathias Drton, University of Chicago “Finiteness in Factor Analysis”

4:00-4:30 PM Break

Graphical Models: 4:30-5:30 PM Carlos Carvalho, Duke University “Dynamic Matrix-Variate Graphical Models”

Poster Session and Reception

440

6:30-8:30 PM SAMSI will provide poster presentation boards and tape. The board dimensions are 4 ft. wide by 3 ft. high. They are tri-fold with each side being 1 ft. wide and the center 2 ft. wide. Please make sure your poster fits the board. The boards can accommodate up to 16 pages of paper measuring 8.5 inches by 11 inches.

Tuesday October 31, 2006 Radisson Hotel RTP Room H, 3rd Floor

8:00-8:30AM Registration and Continental Breakfast

High Dimensional Inference: 8:30-9:30 AM Alberto Roverato, University of Bologna “A Robust Procedure for Gaussian Graphical Model Search from Microarray Data with p larger than n” 9:30-10:30 AM Adrian Dobra, University of Washington “High Dimensional Structural Learning and its Applications to Data Fusion”

10:30-10:45 AM Break

High Dimensional Inference: 10:45-11:45 AM Dhafer Malouche, Ecole Supérieure de la Statistique et de l’Analyse de l’Information “Order of Separability in Graphs and the uPC Algorithm”

11:45-1:00 PM Lunch

Bayesian Multivariate Thinking: 1:00-2:00 PM Mike West, Duke University “Sparse Statistical Modelling: A Bayesian Thinking about Random Factors, Graphs and Matrices” 2:00-3:00 PM Anthony O’Hagan, University of Sheffield “Eliciting Multivariate Distributions”

3:00-3:30 PM Break

Bayesian Multivariate Methods: 3:30-4:30 PM Thomas DiCiccio, Cornell University “Objective Bayesian Inference and Covariance Matrices” 4:30-5:30 PM Makram Talih, Hunter College of the City University of New York “Geodesic Random Walk on Covariance Matrices: Construction and Practical Issues”

Wednesday November 1, 2006 Radisson Hotel RTP

441

Room AB, 2nd Floor

8:30-9:00 AM Registration and Continental Breakfast

Machine Learning: 9:00-10:00 AM Francis Bach, Ecole des Mines de Paris “Learning Graphical Models for Stationary Time Series” 10:00-11:00 AM Tom Griffiths, University of California, Berkeley “The Indian Buffet Process”

11:00-11:15 AM Break

Machine Learning: 11:15-12:15 PM Sayan Mukherjee, Duke University “Integral Operators and Priors”

12:15-1:15 PM Lunch

Inference in Bio-Medical and Genetic Problems: 1:15-2:15 PM Lurdes Inoue, University of Washington “Network Models for Time-Course Microarray Data” 2:15-3:15 PM Sung Jun, Los Alamos National Laboratory “High Dimensional Inference in Biomedical Magnetoencephalography Brain Imaging”

3:15-3:45 PM Break

Inference in Bio-Medical and Genetic Problems: 3:45-4:45 PM Malgorzata Bogdan, Wroclaw University of Technology, Poland “On some Bayesian Procedures for Multiple Testing”

4:45-6:00 PM Formation of Working Groups

Thursday November 2, 2006 Radisson Hotel RTP Room H, 3rd Floor

8:30-9:30 AM Registration and Continental Breakfast

9:30-12:45 PM Free for Individual Work or Groups Discussion

12:45-2:00 PM Lunch

2:00-3:00 PM Thomas Richardson, University of Washington “Binary Models for Marginal Independence”

3:00-4:00 PM Working Groups Open Discussion

Friday November 3, 2006

442

Radisson Hotel RTP Room H, 3rd Floor

8:30-9:30 AM Registration and Continental Breakfast

9:30-12:45 Free for Individual Work or Groups Discussion

12:45-2:00 PM Lunch

2:00-4:00 PM Working Groups Open Discussion

Speaker Abstracts

Francis Bach Ecole des Mines de Paris [email protected]

“Learning Graphical Models for Stationary Time Series”

Probabilistic graphical models can be extended to time series by considering probabilistic dependencies between entire time series. For stationary Gaussian time series, the graphical model semantics can be expressed naturally in the frequency domain, leading to interesting families of structured time series models that are complementary to families defined in the time domain. In this paper, we present an algorithm to learn the structure from data for directed graphical models for stationary Gaussian time series. We describe an algorithm for efficient forecasting for stationary Gaussian time series whose spectral densities factorize in a graphical model. We also explore the relationships between graphical model structure and sparsity, comparing and contrasting the notions of sparsity in the time domain and the frequency domain. Finally, we show how to make use of Mercer kernels in this setting, allowing our ideas to be extended to nonlinear models. (In collaboration with M.I. Jordan, U.C. Berkeley)

Malgorzata Bogdan, PhD Wroclaw University of Technology, Poland Institute of Mathematics and Computer Science [email protected]

“On some Bayesian Procedures for Multiple Testing”

In the spirit of modeling inference for micro arrays as multiple testing for sparse mixtures, we present a similar approach to a simplified version of quantitative trait loci (QTL) mapping. Differently than in the case of micro arrays, when the number of tests usually reaches tens of thousands, the number of tests performed in scans for QTL usually does not exceed several hundreds. However, in typical cases, the sparsity p of significant alternatives for QTL mapping is in the same range as for micro arrays. For methodological interest, as well as some related applications, we also consider non-sparse mixtures. Using simulations as well as some theoretical observations we study false discovery rate (FDR), power and misclassification probability for the modified versions of Simes-Benjamini-Hochberg (SBH) procedure, aimed at controlling the false discovery rate, as well as for various parametric and nonparametric Bayes and Parametric Empirical Bayes procedures aimed at minimizing the misclassification

443

probability. We observe that due to the problems with identifiability of model parameters testing in sparse mixtures requires using the strongly informative prior on p. The resulting full and Parametric Empirical Bayes procedures turn out to have very good properties in the interesting range of p<0.2. We also demonstrate very good properties of a novel application of the nonparametric algorithm for mixture estimation, due to Newton (2002), and a relatively good performance of .the full Nonparametric Bayes method based on Dirichlet mixtures.

Carlos Carvalho Duke University Department of Statistical Science [email protected]

"Dynamic Matrix-Variate Graphical Models"

This paper introduces a novel class of Bayesian models for multivariate time series analysis based on a synthesis of dynamic linear models and graphical models. The synthesis uses sparse graphical modeling ideas to introduce structured, conditional independence relationships in the time-varying, cross-sectional covariance matrices of multiple time series. We define this new class of models and their theoretical structure involving novel matrix-normal/hyper-inverse Wishart distributions. We then describe the resulting Bayesian methodology and computational strategies for model fitting and prediction. This includes novel stochastic evolution theory for time-varying, structured variance matrices, and the full sequential and conjugate updating, filtering and forecasting analysis. The models are then applied in the context of financial time series for predictive portfolio analysis. The improvements defined in optimal Bayesian decision analysis in this example context vividly illustrate the practical benefits of the parsimony induced via appropriate graphical model structuring in multivariate dynamic modeling. We discuss theoretical and empirical aspects of the conditional independence structures in such models, issues of model uncertainty and search, and the relevance of this new framework as a key step towards scaling multivariate dynamic Bayesian modeling methodology to time series of increasing dimension and complexity.

Merlise Clyde Duke University Department of Statistical Science [email protected]

“Classification and Covariance Estimation via Bayesian Shrinkage”

We approach the problem of estimating a sparse precision matrix by representing the joint distribution as a series of compositional regression problems as in Wermuth. Under the well known Wishart prior, regression coefficients have independent Gaussian distributions, which lead to ridge-regression shrinkage of regression coefficients, but non-sparse estimates of precision matrices. To achieve sparsity, we extend the normal prior distributions to include scale mixtures of normals, such as the double exponential or the Normal-Exponential-Gamma prior of Griffin and Brown. These priors lead to "LASSO" like shrinkage of regression coefficients in the compositional regressions and as modal estimates of coefficients may be exactly zero, lead to sparse estimates of the precision matrix. We will discuss simulation results and applications to gene-expression data. If time permits, we will discuss extensions to classification problems based on retrospective designs.

444

*This is joint work with Rosy Luo and Ed Iversen

Thomas DiCiccio Cornell University Department of Social Statistics [email protected]

“Objective Bayesian Inference and Covariance Matrices”

Some basic notions of objective Bayesian inference are reviewed, particularly in the context of a vector parameter of interest. The effect of the ensuing "noninformative" priors on estimation is investigated, and techniques for implementing these priors are discussed. These techniques are applied in the context of estimation problems about covariance matrices.

Adrian Dobra University of Washington Department of Statistics [email protected]

“High-Dimensional Structural Learning and Its Applications to Data Fusion”

I present a novel constructive approach called HdBCS to generate large-scale undirected Gaussian graphical models based on a sparse representation of the joint distribution of covariates via sets of linear regressions. I discuss the validity of my stochastic search algorithm and show how to estimate various dependence measures (e.g., Kendall's tau, Spearman's rho) by taking into account model uncertainty. Next I develop a comprehensive framework for combining ordered categorical and continuous covariates into parsimonious predictive models for categorical variables with two or more levels. I emphasize the importance of parallel computing in exploring huge spaces with thousands of covariates. I will end with an example focusing on the identification of multivariate patterns of association among gene expression profiles, SNPs and clinical data that are predictive of atherosclerosis burden in human target tissues.

Mathias Drton University of Chicago Department of Statistics [email protected]

“Finiteness in Factor Analysis”

In factor analysis observed variables are modeled as conditionally independent given fewer hidden variables, known as factors, and all the random variables follow a multivariate normal distribution. The parameter space of a factor analysis model is a set of covariance matrices. Consider a fixed number of factors and an increasing number of observations, that is, the size of the covariance matrix increases. Does there exists a distinguished matrix size after which one can determine whether a given covariance matrix belongs to the parameter space by determining whether principal sub matrices of the distinguished size belong to the corresponding parameter space. This finiteness question is studied for small number of factors. When finiteness holds

445 there is hope for the existence of goodness-of-fit tests that perform well in high-dimensional settings.

Tom Griffiths University of California, Berkeley Department of Psychology [email protected]

“The Indian Buffet Process”

Methods from nonparametric Bayesian statistics are often used to address model selection problems such as determining the number of components in a mixture model. At the heart of these methods is the Chinese restaurant process (CRP), which defines a distribution on assignments of observations to components in a way that does not limit the number of components. In this work, we expand the class of statistical models to which nonparametric Bayesian methods can be applied by defining a distribution on binary matrices that leaves the number of columns unbounded. This distribution can be used as a prior over matrices of latent features or over bipartite graphs, allowing the effective dimensionality of these structures to be determined by the data. This distribution has several desirable properties, analogous to those of the CRP. In particular, it can be described as the outcome of a simple sequential process, which we call the Indian buffet process, and the rows of the resulting matrix are exchangeable. I will outline the basic ideas behind this approach, as well as some recent applications and extensions. This is joint work with Zoubin Ghahramani and Frank Wood.

Lurdes Inoue University of Washington Department of Biostatistics [email protected]

“Network Models for Time-Course Micro array Data”

Network models are the focus of a growing number of researchers concerned with discovering novel gene interactions and regulatory relationships between genes from time course expression data. In this talk we will discuss some approaches for inferring networks from time-course expression data exploring global or time-varying relationships between genes. We will illustrate the methods with simulation studies and a case study using time course micro array data arising from animal models on prostate cancer progression.

This is joint work with D. Telesca, M.Neira, C. Nelson, M.Gleave and R. Etzioni.

Sung Jun Los Alamos National Laboratory Applied Modern Physics Group [email protected]

“High Dimensional Inference in Biomedical Magnetoencephalography Brain Imaging”

The magnetoencephalographic (MEG)/ electroencephalographic (EEG) brain signals have been recently investigated to detect spatial and temporal brain current activations. These current

446

source localization problems fall into inverse problems. Particularly, MEG/EEG signals are inherently very noisy (around 0 - 4 dB), thus these have unavoidable serious ill-posedness.

We present how Bayesian inference analysis on these problem is promising and discuss its related issues: spatio-temporal noise covariance estimation, sampling technique, computational strategies, and so on. Finally, we present our outlook on applicability of Bayesian inference analysis for other biomedical imaging techniques.

Dhafer Malouche Ecole Superieure de la Statistique et de l'Analyse d'Information Department of Statistic [email protected]

“Order of Separability in Graphs and uPC Algorithm”

The aim of this paper is to devise a new PC-algorithm, uPC-algorithm, for estimating a high dimension undirected and sparse graph associated to a faithful Gaussian Graphical Model. First, we define the order of separability of a graph as the maximum cardinality among all its minimal separators.

We construct a sequence of graphs by considering successively lower conditional dependency. We prove that these graphs are nested and, at a limited stage equal to the order of separability, this sequence is constant and equal to the true graph. Thus we estimate the true graph from a given dataset by a step-down procedure based on a recursive estimation of this sequence of nested graphs. After the proof of the consistency of the uPC-algorithm, we show on simulated data its accuracy and computational efficiency.

Sayan Mukherjee Duke University Department of Statistical Science [email protected]

“Integral Operators and Priors: Sayan”

This talk will argue the perspective of considering priors based on integral operators. The main example of focus will be kernel methods. Kernel methods have been very popular in the machine learning literature in the last ten years, mainly in the context of Tikhonov regularization algorithms. In this paper we study a coherent Bayesian kernel model based on an integral operator defined as the convolution of a kernel with signed measures. Priors on the random signed measures correspond to priors on the functions output by the integral operator. Our primary result is the equivalence under certain classes of signed measures of the function class defined by the integral operator and the reproducing kernel Hilbert space induced by the kernel. A consequence of this result is a function theoretic foundation for using some common prior specifications in non-parametric Bayesian modeling: Gaussian process priors and Dirichlet process priors. A general framework for the construction of priors on signed measures using Levy processes.

Anthony O'Hagan University of Sheffield

447

Department of Probability and Statistics [email protected]

“Eliciting Multivariate Distributions”

There exists a very substantial body of research by psychologists on eliciting probability judgments. There is rather less on eliciting a distribution, and almost nothing on eliciting joint distributions. Statisticians have tended to come up with schemes designed to elicit hyper parameters of specific distributional families, and many of these are multivariate. However, this imposes very strong assumptions on the expert whose opinions are to be elicited. I will discuss this work briefly, concentrating on the psychological issues, and then present ideas for eliciting joint distributions. In particular, I will discuss the implications of the inevitable imprecision in expert judgments.

Mohsen Pourahmadi Northern Illinois University Department of Statistics [email protected]

“Searching for Ideal Priors (Reparametrization) for Covariance Matrices”

In developing Bayesian methods for inference for covariance matrices, traditionally the conjugate prior and/or inverse Wishart is used.However,the success of Bayesian computation(MCMC)in the early 90's has opened up the possibility of using more flexible and elaborate nonconjugate priors. Such priors are introduced using reparametrizations of covariance matrices based on their variance-correlation, spectral and Cholesky decompositions. In this talk, we review some of the most notable contributions in this area in the last 15 years and point out the most "ideal" results coming from the three decompositions.

Thomas Richardson University of Washington Department of Statistics [email protected]

“Binary Models for Marginal Independence”

Log-linear models are a classical tool for the analysis of contingency tables. In particular, the subclass of graphical log-linear models provides a general framework for modeling conditional independences. However, with the exception of special structures, marginal independence hypotheses cannot be accommodated by these traditional models. For example, it is not possible to formulate a model for four variables (A,B,X,Y) such that A is independent of B, and X is independent of Y (and no other restrictions are imposed). Focusing on binary variables I will present a new model class that provides a framework for addressing this problem. The approach is graphical and based on bi-directed graphs, which are in the tradition of path diagrams. In many respects the resulting models and associated fitting algorithms are dual to graphical log-linear models.

Alberto Roverato University of Bologna

448

Department of Statistics [email protected]

“A Robust Procedure for Gaussian Graphical Model Search from Micro array Data with P Larger than N”

Learning of large-scale networks of interactions from micro array data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are full-order partial correlations which are partial correlations between two variables given the remaining ones. In the context of micro array data the number of variables exceeds the sample size and this precludes the application of traditional structure learning procedures because a sampling version of full-order partial correlations does not exist. In this paper we consider limited-order partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on limited-order partial correlations based on a quantity that we call the non-rejection rate. The applicability and usefulness of the procedure are demonstrated to both simulated and real data.

This work is in collaboration with Rober Castelo, University Pompeu Fabra, Barcelona, Spain.

Dongchu Sun University of Missouri-Columbia Department of Statistics [email protected]

“Objective Bayesian Analysis for the Normal Variance and Precision Matrices for Nonstandard Cases”

We often obtain observations from normal populations under nonstandard cases. Star-Shaped models are very common in graphical and spatial models, where variables are conditionally independent and the precision of covariance matrices are sparse. In other cases, there are large numbers of missing values and observations could have a staircase pattern.

We study the objective Bayesian analysis for variance and precision matrices under star-shaped models. Various losses functions such as entropy and symmetric losses are considered. With the help of Cholesky decomposition of the precision matrix, a group on blockwise lower- triangular matrices is introduced. General forms of equivariant estimates of the covariance and precision matrices are obtained. The invariant Haar measures of the group, the reference prior and the Jeffrey’s prior of the Cholesky decomposition matrix are also discussed. Moreover, using Bayesian method, we obtain the best equivariant estimators of the precision matrix with respect to this group. Consequently, the MLE of the precision matrix and covariance matrix are both inadmissible under either entropy or symmetric loss.

For the normal staircase pattern data, it is of interest that the Jeffrey’s prior is the same as that with complete data. Various objective priors are obtained and compared with the Jeffrey’s prior

449

and haar priors. The closed form expressions of the best equivariant estimator of the covariance and precision matrices are also available.

Makram Talih Hunter College of the City University of New York Department of Mathematics and Statistics [email protected]

“Geodesic Random Walk on Covariance Matrices: Construction and Practical Issues”

This talk is a report on work in progress, motivated by covariance estimation in the multivariate Normal model when the underlying precision matrix is driven by a hidden Markov chain. The talk will focus mainly on the construction of a random walk (RW) on the cone of symmetric positive definite (SPD) matrices such that the path between consecutive points is a geodesic segment. The resulting matrix-valued RW has the appeal that it remains inside the cone of SPD: thus, it can serve as a model for multivariate stochastic volatility, or, more simply, for Monte Carlo inference from IID multivariate data.

Nanny Wermuth Chalmers/Gothenburg University Department of Mathematical Statistics [email protected]

“Distortions of Effects”

Unnoticed confounding may severely distort the direction and strength of the effect of an explanatory variable on its response variable, as given by a stepwise data generating process. For direct confounding this effect is known. If it arises from a common unobserved explanatory variable, it is relevant mainly for observational studies, since it is avoided by successful randomization. By contrast, indirect confounding, which we study now, is an issue also for intervention studies. For general stepwise generating processes, we provide matrix and graphical criteria to decide which types of direct or indirect confounding may be present and when they are absent. The tools used are operators for real-valued and for binary matrices that were developed recently.

Mike West Duke University Department of Statistical Science [email protected]

"Sparse Statistical Modeling: A Bayesian Thinking about Random Factors, Graphs and Matrices"

I will overview our work over the last several years in the application of sparsity modeling in high-dimensional latent factor analysis, prefaced by some general discussion of Bayesian thinking with sparsity priors in simpler contexts of multivariate anova and regression. This overview will focus heavily on our work in motivating applications in genomics, involving biological pathway studies. I will give several examples and discussion of how this applied work

450

has motivated novel methodology including non-Gaussian factor models and methods of stochastic search and computation.

And, for the SAMSI investigators and to keep faith with the primary theme of the workshop, I will note that linear, Gaussian factor models are models of covariance matrices, whereas Gaussian graphical models are models of inverse covariance matrices. Questions arise about the relationships between the two approaches, and also about how much of what we learn from Gaussian models helps us think about non-Gaussian contexts. Time permitting I'll expose some of these questions (no answers, just questions ...)

Poster Abstracts

Edo Airoldi Carnegie Mellon University School of Computer Science [email protected]

“Lognormal Graphs”

I introduce the concept of lognormal graphs; a fairly large class of graphs that includes power- law graphs. I study how graphs in this class may arise, in the real world, and I find that they do in two sets of circumstances: (i) they are a powerful attractor and therefore may arise as a limit of a large class of multiplicative aggregation processes; (ii) they may arise as an artifact of the way we measure the presence of relations among objects in a population of interest. I then show that power-law graphs provide a first-order approximation to lognormal graphs.

Zhonggai Li Virginia Polytechnic Institute and State University Department of Statistics [email protected]

“Objective Priors in Conditional Independence Models”

In this study, we work on the star-shape models, where the precision matrix Ω is structured by the special conditional independence. In which, there are k +1 groups of normally distributed variables. The p0 variables in the first group are called global variables; they are correlated with all other variables. Variables in other groups are called local variables. Local Variables from different groups are independent of each other, even though that local variables in same group are still correlated. We re-parameterize - into (D; T ) using Cholesky decomposition and work on the new parameter sets for deriving its reference prior, which is showed to be equivalent to the right Haar measure. The posterior distribution of the (D; T) under reference priors is derived and given in the random posteriors form.

Jinnan Liu York University. Department of Mathematics and Statistics [email protected]

451

“Computing the Normalizing Constant by Monte Carlo Method and Model Selection for Discrete Graphical Models”

Let us assume that we are dealing with a contingency table where all variables are binary taking values 0 or 1 under a multinomial sampling scheme. Our goal is to pick a graph G that’s best for the data in the sense that it has the highest posterior density. This posterior of a graph G given data will reduce to the ratio of two normalizing constants if we use a uniform prior on all possible graphs and the conjugate prior on the natural parameters of the multinomial distribution in exponential family form. Neither this ratio nor the normalizing constant itself has a closed form when the graph G is non-decomposable. We will present a Monte Carlo method to compute these normalizing constants and then do model selection based on the ratio of these constants, with applications to graphical models with 4 vertices, using simulated sample data from the smallest non-decomposable model: the 4-cycle. Of all the 64 possible models with 4 vertices, 61 are decomposable. Therefore, we know the theoretical values for the constants and a comparison will be given between the theoretical values and our Monte Carlo estimates to demonstrate that our method can give a numerically accurate estimate for the constants.

Liang Zhang Duke University Institute of Statistics and Decision Sciences [email protected]

“Local Graphical Model Search”

Compared to general graphical model search methods, we present here our work on local graphical model search algorithms. Local graphical model search algorithms will apply to the problem if we are only interested in one gene Y in thousands of genes in the gene expression data, for example, and wish to understand the graphical structure of Y and its graphical structure, where usual (global) graphical model search methods will not be efficient and precise. Also, the prediction of Y based on the local graphical structure is one of our interests. Monte Carlo Markov Chain methods and Shotgun Stochastic Search will be tried to do the search. We will provide several examples to analyze the efficiency as well as the precision of our algorithms.

C. Large Graphical Models and Random Matrices Workshop November 9-11, 2006

Thursday, November 9, 2006 MCNC Auditorium, RTP (Shuttle Service from the Radisson to MCNC is Provided from 8:15-9:00 AM)

8:30-9:00 AM Continental Breakfast and Registration

9:00-9:15 AM Welcome Jim Berger/Chris Jones, SAMSI

9:15 -12.30 PM Session I “Partial Inversion and Partial Closure of Paths on Graphs: Two Matrix Operators to Study Properties of Large Systems Generated Over Graphs” 452

Nanny Wermuth, Chalmers University of Technology

“Separation in Directed Acyclic Graphs: An Approach Based on Matrix Operators” Giovanni Marchetti, University of Florence

“Multilevel graphical models” Anna Gottard, University of Florence

12:30-2:00 PM Lunch

2:00-4:00 PM Session II “Dynamic Path Analysis: a New Approach to Analyzing Time- Dependent Covariates” Egil Ferkingstad, University of Oslo

“On Graphical Modelling of Data Affected by Sample Selection” Elena Stanghellini, University of Perugia

4:00-4:30 PM Break

4:30-5:30 PM SAMSI Distinguished Lecture MCNC Auditorium, RTP Sir David R. Cox, Honorary Fellow, Nuffield College (Reception follows until 6:30pm) “Some Statistical Challenges Arising from an Issue in Veterinary Epidemiology”

(Shuttle service provided from MCNC to the Radisson. Last shuttle departs MCNC at 6:30 PM)

Friday, November 10, 2006 Radisson Hotel RTP Room BC, 2nd Floor

8:30-9:00 AM Continental Breakfast and Registration

9:00-1:00 PM Session III

“Derived Variables” Sir David R. Cox, Honorary Fellow, Nuffield College

“A Local, First-Order Characterization of Omitted Variable Bias for Propensity-Stratified Data” Ben Hansen, University of Michigan,

“The Framework of Principal Stratification for Partially Controlled Studies” Constantine Frangakis, Johns Hopkins University

“Identification of Minimal Sets of Covariates for Matching Estimators” Xavier de Luna, University of Umea

453

1:00-3:00 PM Lunch

3:00-6:00 PM Session IV

“Representing Equivalence Classes of DAG Models in the Presence of Selection and Latent Variables” Ayesha Ali, University of Guelph

“Partial Mapping, Partial Inversion and Joint-Response Chain Graphs” Michael Wiedenbeck, Mannheim Center of Social Surveys

“A New Technique to Prove Markov Equivalence” Oscar Hammar, Chalmers and Gothenburg University

Saturday, November 11, 2006 Radisson Hotel RTP Room BC, 2nd Floor

8:30-9:00 AM Continental Breakfast and Registration

9:00-1:00 PM Session V

“High-Dimensional Networks as Conduits for Uncertain Information” Arthur Dempster, Harvard University

“The Conjugate Prior for Discrete Hierarchical Models” Hélène Massam, University of York

“Robust Model Selection for Graphical Models” Sonja Kuhnt, University of Eindhoven

“Active Structure Search Using Interventions” Fredrick Eberhard, Carnegie Mellon University

1:00-2:00 PM Lunch

Speaker Abstracts

Thursday, November 9 9:15-12:30 PM

Nanny Wermuth Chalmers/Gothenburg University Department of Mathematical Statistics [email protected]

454

“Partial Inversion and Partial Closure of Paths on Graphs: Two Matrix Operators to Study Properties of Large Systems Generated Over Graphs”

Graphical Markov models have some advantages over other multivariate statistical models. They permit to model stepwise data generating processes with and without interventions and to work out consequences of a large model for subsets of variables. Such implications can then be compared with available background knowledge or may be used to judge seemingly inconsistent results in similar studies. Two recently developed matrix operators, one for real-valued matrices and one for binary matrices are useful for understanding and deriving properties and implications of graphical Markov models.

Giovanni Marchetti University of Florence Department of Statistics "G. Parenti" [email protected]

“Separation in Directed Acyclic Graphs: An Approach Based on Matrix Operators”

For directed acyclic graph models there are two known criteria to check if a conditional independence between variables A and B given C holds for distributions factorizing according to the given graph. They are called separation criteria because independence holds when the conditioning set C is a separating set, in the sense of graph theory. The first criterion introduces a new concept of d-separation but is applied to the original directed acyclic graph, while the second uses the basic notion of separation in a new induced undirected graph, called the moral graph.

We discuss an alternative approach based on several types of graphical models induced by the directed acyclic graph model, each with an associated edge matrix, i.e. a binary matrix with zeros indicating the structural independencies implied. Independencies implied by the model are assessed by checking that a particular part of the edge matrix of the induced graphical model contains all zeros. This approach suggests also a new separation criterion that is equivalent to the previous ones.

Anna Gottard University of Florence Department of Statistics [email protected]

“Multilevel Graphical Model”

Clustered data are ubiquitous in many fields of applied statistics. Multilevel modeling is one of the approaches to deal with clustered data structures to be preferred when the hierarchical structure is of interest. In this talk, after a brief introduction to clustered data issue, I will present in detail two--level models in case of multivariate Normal distribution, highlighting a particular set of assumptions on the conditional independence structure which researchers should be aware of. Afterwards, I'm going to introduce a class of chain graph suitable for multilevel clustered data structures. Such class of graphs are able to represent all the assumption on the conditional independence structure of a multilevel model, by using different types of node. Potentialities and

455

limits of graphical multilevel models will be discussed.

Thursday, November 9 2:00-5:30 PM

Egil Ferkingstad University of Oslo Department of Biostatistics [email protected]

“Dynamic Path Analysis: A New Approach to Analyzing Time-Dependent Covariates”

We introduce a general approach to dynamic path analysis. This is an extension of classical path analysis to the situation where variables may be time-dependent and where the outcome of main interest is a stochastic process. In particular we will focus on the survival and event history analysis setting where the main outcome is a counting process. Our approach will be especially fruitful for analyzing event history data with internal time-dependent covariates, where an ordinary regression analysis may fail. The approach enables us to describe how the effect of a fixed covariate partly is working directly and partly indirectly through internal time-dependent covariates. For the sequence of times of event, we define a sequence of path analysis models. At each time of an event, ordinary linear regression is used to estimate the relation between the covariates, while the additive hazard model is used for the regression of the counting process on the covariates. The methodology is illustrated using data from a randomized trial on survival for patients with liver cirrhosis.

Elena Stanghellini University of Perugia Economics Finance and Statistics [email protected]

“On Graphical Modelling of Data Affected by Sample Selection”

The talk will focus on the use of conditional independence models with data affected by sample selection. We first review some results on distortion in linear regression coefficients induced by truncation, which is a particular from of selection. We present examples where the a priori knowledge of some conditional independencies allows to disentangle the distortions induced by truncation from the ones induced by latent variables therefore permitting to simplify the estimation of the parameters of interest. Connections to the existing models of sample selection are also established and examples are given to show how the above results on truncation can be of use in modelling data affected by other forms of sample selection, such as censoring.

SAMSI Distinguished Lecture

Sir David R. Cox, Honorary Fellow, Nuffield College, Oxford MCNC Auditorium, RTP Thursday, November 9, 2006 4:30-5:30 PM (reception to follow)

456

David R. Cox is among the most important statisticians of the past century. He has made pioneering and highly influential contributions to a uniquely wide range of topics in the theory, methodology, and application of statistics and applied probability. To mention just one, he developed the proportional hazards model, which is the basis for most methodology today in the analysis of survival data. His over 300 papers and books have influenced generations of scholars.

“Some Statistical Challenges Arising from an Issue in Veterinary Epidemiology”

Bovine TB is caused by an organism M. bovis that lives also in wildlife; in the US in deer and in the UK in badgers and deer. Over the last 25 years in the UK the disease has steadily increased and also spread geographically. Eighty years of work on the disease will be summarized with an emphasis on the wide-ranging statistical problems of design, analysis and modelling that have arisen.

Friday, November 10, 2006 9:00-1:00 PM

Sir David Cox Honorary Fellow, Nuffield College, Oxford [email protected]

”Derived Variables”

Often the variables used for analysis and interpretation are combinations of more basic observations, sometimes called pointer readings. A familiar example is the body mass index, weight divided by the square of height. Some implications of graphical Markov models for the construction and analysis of derived variables are outlined.

Ben Hansen University of Michigan Department of Statistics [email protected]

”A Local, First-Order Characterization of Omitted Variable Bias for Propensity-Stratified Data”

One statistical adjustment used in observational studies is to match or stratify on an estimate of the propensity score, the conditional probability of assignment to treatment given available covariates. If potential responses and treatment assignments are conditionally independent given these covariates, then theory and practical experience suggest that the adjustment removes much of the bias due to nonrandom assignment, so that one is left with a study that, for statistical purposes, resembles an experiment randomized within the strata. However, one must be alert to the possibility that one or more covariates omitted from the propensity adjustment would be needed to establish the necessary conditional independence.

This talk outlines an asymptotic analysis of matching or stratification on estimated propensities, with some implications for omitted variable bias that is "local", in a sense to be explained. The framework appears to have relevance both to adjustment strategies based on propensity score

457

stratification alone and to adjustment strategies combining it with other techniques, such as (perhaps) graphical models.

Constantine Frangakis Johns Hopkins Department of Biostatistics [email protected]

“The Framework of Principal Stratification for Partially Controlled Studies”

Our ability to examine treatment effects is often restricted to partially controlled studies, i.e., studies that control only part of the mechanism that assigns the treatments. Such studies have been typically analyzed with the standard framework of instrumental variables. However, the setting of many partially controlled studies suggests that the validity of the assumptions, and even the goals of standard instrumental variables are questionable. In this presentation, we use as examples three problems to review the role that a richer framework – principal stratification – has for (1) employing more realistic assumptions; (2) setting better goals; and (3) indicating better designs: (1) Experiments to estimate the effect of School Choice, aced with combined noncompliance to treatment assignment and incomplete follow-up data;

(2) Experiments to estimate the effect of a controlled treatment on an outcome for units for which the controlled treatment does not affect an intermediate, uncontrolled factor, with application from the literature on recent vaccines trials;

(3) Studies to estimate the relation that a health characteristic before a critical event (e.g., injury) has on the risk of death from the critical event, but where the exposure can be missing precisely for people who die just after the critical event.

The work is join with Don Rubin, Ellen MacKenzie, and Ming-Wen An

Xavier de Luna Umea University Department of Statistics [email protected]

“Identification of Minimal Sets of Covariates for Matching Estimators”

The Neyman-Rubin model (potential outcome framework) and the associated matching estimators have become increasingly popular, because they allow for the non-parametric estimation of average treatment effects. Like parametric models (e.g., ANCOVA), matching estimators control for a set of covariates (pre-treatment characteristics) in order to estimate the effect of a non-randomized treatment. However, unlike regression models, the selection of the covariates to be used with matching estimators has attracted little attention in the literature. This talk discusses why, when using matching estimators, the set of covariates used has to be "minimal". A set of covariates is said minimal if it cannot be reduced without violating the assumptions of the Neyman-Rubin model. Moreover, sufficient conditions are given for the identifiability of a minimal set of covariates. In order to obtain such conditions we use graphical models to impose restrictions on the set of conditional independence statements holding for the

458

random variables involved. Finally, data-driven methods for the selection of the covariates are discussed.

Friday, November 10, 2006 3:00-6:00 PM

Rebecca Ayesha Ali University of Guelph Department of Mathematics and Statistics [email protected]

“Representing Equivalence Classes of DAG Models in the Presence of Selection and Latent Variables”

Maximal ancestral graphs provide a way of encoding the independence relations that arise among the observed variables of DAG models with selection and latent variables. Here, we present a join operation on maximal ancestral graphs that gives rise to a unique representation for Markov equivalent maximal ancestral graphs. This equivalence class representative is analogous to the essential graph for DAGs. We also present a set of orientation rules that construct the equivalence class representative given a single member (ancestral graph) of the equivalence class. These results may be useful in model selection when one is interested in searching across equivalence classes of ancestral graphs.

Michael Wiedenbeck Mannheim Center of Social Surveys [email protected]

”Partial Mapping, Partial Inversion and Joint-Response Chain Graphs”

When a mathematical relation is inverted the picture is exchanged with the argument. The concept of partial mapping and partial inversion is introduced and discussed. Then, partial inversion is applied to nonsingular covariance matrices to derive linear least squares regression coefficients, conditional covariances and inverse marginal covariance matrices. Another application is the block-triangular decomposition of covariance matrices which is related to different types of joint-response chain graphs.

Oscar Hammar Göteborg University Department of Mathematics and Statistics [email protected]

“A New Technic to Prove Markov Equivalence”

Markov equivalence between certain types of chain graphs have been proven in the literature using path criteria. We use a matrix formulation for induced edges to simplify proofs a directed acyclic graph to chain graphs and to tackle new problems.

Saturday, November 11, 2006 9:00-1:00 PM

459

Arthur Dempster Harvard University Department of Statistics [email protected]

“High-Dimensional Networks as Conduits for Uncertain Information”

After discussing what network models are and are not, and sketching several applied areas of contemporary interest and importance, I argue that DS (Dempster-Shaferâ) methodology is a natural tool for modeling and analysis of complex networks because, information needs to be propagated and fused from widely dispersed regions of the network; and information always remains completely absent regarding much of the network.

DS theory provides a natural and flexible extension of Bayesian methods, impartially mixing probabilistic uncertainty and logical degrees of “don’t know†. Important elements of the theory were published, 40, 30, and 20 years ago, but computational feasibility is just now coming on line.

Hélène Massam York University Department of Mathematics and Statistics [email protected]

“The Conjugate Prior for Discrete Hierarchical Models”

In the Bayesian analysis of contingency table data, the selection of a prior distribution for either the log-linear parameters or the cell probabilities parameter is a major challenge. Though the conjugate prior on cell probabilities has been defined by Dawid and Lauritzen (1993) for decomposable graphical models, it has not been identified for the larger class of graphical models Markov with respect to an arbitrary undirected graph or for the even wider class of hierarchical log-linear models. Working with the log-linear parameters used by GLIM, we first define the conjugate prior for these parameters and then derive the induced prior for the cell probabilities: this is done for the general class of hierarchical log-linear models. We show that the conjugate prior has all the properties that one expects from a prior: notational simplicity, ability to reflect either no prior knowledge or a priori expert knowledge, a moderate number of hyperparameters and mathematical convenience. It also has the strong hyper Markov property which allows for local updates within prime components for graphical models.

Sonja Kuhnt Eindhoven University of Technology Department of Mathematics and Computer Science [email protected]

“Robust Model Selection for Graphical Models”

In every data set observations can occur which deviate from the pattern presented by the main part of the data. Such so-called outliers can severely influence the result of a model selection process. We will provide a formal definition of outliers with respect to CG-distributions,

460 capturing the perception of outliers as surprising observations. Based on the usual likelihood equations we define a general class of modified likelihood estimators which allow for robust parameter estimation. Finally the combination of one-step outlier identification with well-known model selection strategies is discussed with a focus on undirected graphical models for mixed variables.

Frederick Eberhardt Carnegie Mellon University Department of Philosophy [email protected]

“Active Structure Search using Interventions”

The causal Bayes net framework (Pearl, 2000; Spirtes et al, 2000) provides a representation of causal structure that captures both the probabilistic relations between variables and the effect of interventions on variables. I will present a general model that captures a broad range of different types of interventions, which imply different experimental search strategies. I will provide worst case bounds on the number of experiments necessary and sufficient to recover the true causal structure and describe some work in progress on identifying the shortest sequence of experiments.

D. Geometry, Random Matrices and Statistical Inference Workshop January 16-19, 2007

Tuesday - January 16, 2007 NISS Building, Room 104

Geometry and Sparsity 8:45 AM Carolina Livery will depart the Radisson for SAMSI If you miss the Carolina Livery shuttle, please see the hotel representatives at the front desk to have the Radisson shuttle bring you.

9:00-9:30 AM Continental Breakfast and Registration

9:30-10:30 AM “Sparsity in High Dimensional Learning Problems” Vladimir Koltchinskii, Georgia Institute of Technology

10:30 -11:00 AM Break

11:00-12:00 PM “Grouped and Hierarchical Selection through Composite Absolute Penalties (CAPs)” Guilherme Rocha, University of California-Berkeley

12:00-1:00 PM Lunch

1:00-2:00 PM “On Kernels, Energy and Metrics” Steven Damelin, Georgia Southern University

2:00 PM Discussion

461

Chair: Misha Belkin, Ohio State University

3:00 PM Carolina Livery will depart SAMSI for the Radisson

Wednesday - January 17, 2007 NISS Building, Room 104

Geometry and Topology 8:45 AM Carolina Livery will depart the Radisson for SAMSI

9:00-9:30 AM Continental Breakfast and Registration

9:30-10:30 AM “Random Fields of Multivariate Test Statistics” Jonathan Taylor, Université de Montréal

10:30 -11:00 AM Break

11:00-12:00 PM “The 3D Random Computed Tomography Structuring of Proteins and the Specral Non-Linear ICA Algorithm” Amit Singer, Yale University

12:00-1:00 PM Lunch

1:00-2:00 PM “Computing Homology of High-Dimensional Point Clouds” Yuriy Mileyko, Duke University

2:00 PM Discussion Chair: Mauro Maggioni, Duke University

3:00 PM Carolina Livery will depart SAMSI for the Radisson

Thursday - January 18, 2007 NISS Building, Room 104

Machine Learning 8:45 AM Carolina Livery will depart the Radisson for SAMSI

9:00-9:30 AM Continental Breakfast and Registration

9:30-10:30 AM “A Geometric Perspective on Learning” Partha Niyogi, University of Chicago

10:30 -11:00 AM Break

11:00-12:00 PM “Random Geometry and Statistical Translation in Text Analysis” Guy Lebanon, Purdue University

12:00-1:00 PM Lunch

462

1:00-2:00 PM “Projection Pursuit, Gaussian Scale Mixtures, and the EM Algorithm” Sanjoy Dasgupta, University of California-San Diego

2:00 PM Discussion

3:00 PM Carolina Livery will depart SAMSI for the Radisson

Friday - January 19, 2007 NISS Building, Room 104

Random Matrices and Covariances 8:45 AM Carolina Livery will depart the Radisson for SAMSI

9:00-9:30 AM Continental Breakfast

9:30-10:30 AM “Regularized Estimation of Large Covariance Matrices” Liza Levina, University of Michigan

10:30 -11:00 AM Break

11:00-12:00 PM “Distributions for Random Positive Definite Matrices” Armin Schwartzman, Harvard School of Public Health

12:00-1:00 PM Lunch

1:00-2:00 PM “Multi-resolution Covariance Modelling in Spatial Random Effect Model” Tao Shi, Ohio State University

2:00 PM Discussion Chair: Sayan Mukherjee, Duke University

3:00 PM Carolina Livery will depart SAMSI for the Airport

Speaker Abstracts

Steven Damelin Georgia Southern University Department of Mathematical Sciences [email protected]

“On Kernels, Energy and Metrics”

In this talk, we link together important ideas of discrepancy, energy and metrics for large classes of symmetric, positive definite kernels defined on compact subsets of Euclidean space.

The kernels in question, for example logarithmic repulsive kernels, arise for example in computer vision, statistical inference and in the study of fluctuations of eigenvalues of random

463

matrices. The work is based, partly on joint work of the author with Grabner (Graz), Levesley (Leicester), and Hickernell (IIT).

Sanjoy Dasgupta University of California-San Diego Department of Computer Science [email protected]

“Projection Pursuit, Gaussian Scale Mixtures, and the EM Algorithm”

The EM algorithm for fitting Gaussian mixture models is one of the most widely-used clustering methods. Yet, surprisingly little is known about its behavior. For instance, there are many different ways to initialize EM, and to merge/remove intermediate clusters, and the effects of these different strategies are not understood in a principled way.

I'll describe a new probabilistic analysis of EM. First of all, it will emerge that many common methods of initializing and running EM produce hopelessly suboptimal results even in ideal clustering scenarios. On the other hand, a particular variant of EM will provably recover a near- optimal clustering, provided that the clusters are adequately separated and that their distributions are of a certain fairly general form.

The type of cluster distributions allowed is motivated by a new result in projection pursuit, along the lines of the folklore that “all projections of high-dimensional data look Gaussian”. Specifically, I'll show that for any D-dimensional distribution with finite second moments, there is a precise sense in which almost all of its linear projections into d < D dimensions look like a scale-mixture of spherical Gaussians (concentric spherical Gaussians with the same center). The extent of this effect depends upon the ratio of d to D, and upon the “eccentricity” of the high- dimensional distribution.

Vladimir Koltchinskii Georgia Institute of Technology School of Mathematics [email protected]

“Sparsity in High Dimensional Learning Problems”

Learning problems (such as regression or classification) can be often formulated as penalized empirical risk minimization over a linear span of a very large dictionary of functions with a properly chosen complexity penalty. One of the choices of penalty that has been very popular in the recent years is $\ell_1$-penalty. This penalty function is convex and, if the loss function is also convex (which is the case in $L_2$- or in $L_1$-regression as well as in large margin classification methods such as boosting or SVM), the penalized empirical risk minimization becomes a convex optimization problem. We will consider $\ell_1$-penalization as well as more general $\ell_p$-penalization with $p>1,$ but close enough to $1.$ A number of inequalities will be considered that relate the degree of “sparsity” of the empirical solution to the degree of “sparsity” of the true solution. Oracle inequalities showing the impact of “sparsity” on the excess risk of the empirical solution will be also discussed.

Guy Lebanon

464

Purdue University Department of Statistics [email protected]

“Random Geometry and Statistical Translation in Text Analysis”

High dimensional structured data such as text and images is often poorly understood and misrepresented in statistical modeling. The standard histogram representation suffers from high variance and performs poorly in general. We explore novel connections between statistical translation, heat kernels on manifolds and graphs, and random geometries. These connections show a surprising link between the statistical ideas of regularization and biased estimation and common practices in text analysis.

Liza Levina University of Michigan Department of Statistics [email protected]

“Regularized Estimation of Large Covariance Matrices”

Estimation of covariance matrices has a number of important applications, which include principal component analysis, classification by linear or quadratic discriminant analysis, and inferring independence and conditional independence between variables. This talk will summarize recent results for regularizing covariance matrices when there is a natural ordering of the variables, and discuss extensions to cases when no such ordering is available. I will also discuss some questions (no answers!) on connections between covariance estimation and nonlinear dimension reduction.

Yuriy Mileyko Duke University Department of Computer Science [email protected]

“Computing Homology of High-Dimensional Point Clouds”

In this talk I will consider the problem of recovering homology groups of a subspace of a high- dimensional Euclidean space given only a finite number of points lying on or near the subspace. We approach this problem by constructing a specific family of nested simplicial complexes, witness complexes, and computing their persistent homology groups. I shall give an overview of witness complexes and persistent homology, and present several results and open questions.

Partha Niyogi University of Chicago Department of Computer Science [email protected]

“A Geometric Perspective on Learning”

465

Increasingly, we face machine learning problems in very high dimensional spaces. We proceed with the intuition that although natural data lives in very high dimensions, they have relatively few degrees of freedom. One way to formalize this intuition is to model the data as lying on or near a low dimensional manifold embedded in the high dimensional space. This point of view leads to a new class of algorithms that are “manifold motivated” and a new set of theoretical questions that surround their analysis. A central construction in these algorithms is a graph or simplicial complex that is data-derived and we will relate the geometry of these to the geometry of the underlying manifold. In particular, we will see the role of the Laplace Beltrami operator in many of these developments. Applications to embedding, clustering, classification, and semi- supervised learning will be considered.

Guilherme Rocha University of California-Berkeley Department of Statistics [email protected]

“Grouped and Hierarchical Selection through Composite Absolute Penalties (CAPs)”

For datasets with many predictors and few samples, side information often must be added to fitting. We introduce Composite Absolute Penalties (CAP) to blend predefined grouping and hierarchical information among the predictors into regression and classification. Special cases include Zou & Hastie(2005)’s elastic net, Kim et. al(2005)’s Blockwise Sparse Regression and Yuan & Lin(2006)’s GLASSO. CAPs are built by combining norm penalties at the across and within group levels. For disjoint groups, a Bayesian interpretation lays bare the role of the norms used to construct CAP. Hierarchical selection is reached by defining nested groups. For general CAPs, we use the BLASSO and cross-validation to compute CAP estimates. For CAPs built from L_1 and L_\infty norms, we give efficient algorithms and regularization selection criteria. The enhnaced prediction performance of CAP estimates is shown through simulated experiments.

Armin Schwartzman Harvard School of Public Health Department of Biostatistics [email protected]

“Distributions for Random Positive Definite Matrices”

In classical multivariate statistics, the most common probability model for positive definite (PD) matrices is the Wishart distribution, which arises as the distribution of the sample covariance matrix of a multivariate normal sample. Randomness in general PD matrix data, however, can be of a very different nature. In Diffusion Tensor Imaging, for instance, PD matrix data is obtained directly from measurements of brain anatomy. In this talk I explore alternative ways to model random PD matrices. This is done keeping in mind three objectives: 1) Arbitrary covariance between matrix elements; 2) Data-analysis friendliness; 3) Match between the MLE for the mean parameter and the already known means for positive definite matrices, i.e. arithmetic, geometric and intrinsic. The new probability distributions obtained are the log normal, the Riemannian log normal and the geodesic normal, all based on the normal distribution for symmetric matrices followed by an appropriate matrix log transformation. The construction gives insights into the geometry of the PD manifold and ways to do statistics on more general manifolds.

466

Tao Shi Ohio State University Department of Statistics [email protected]

“Multi-resolution Covariance Modelling in Spatial Random Effect Model”

In spatial statistics, estimating and inverting the spatial covariance matrix is difficult when sample size $n$ is large. The computation complexity of kriging is $n^3$. In this paper, we propose a Spatial Mixed Effects (SME) statistical model to predict the missing values, denoise the observed values, and quantify the spatial-prediction uncertainties. The computations associated with the SME model are linear scalable to the number of data points, which makes it feasible to process massive global satellite data. We apply our proposed methodology, which we call Fixed Rank Kriging (FRK), to the level-3 Aerosol Optical Depth dataset collected by NASA’s Multi-angle Imaging SpectroRadiometor (MISR) instrument flying on the Terra satellite. Overall, our results were superior to those from nonstatistical methods and, importantly, FRK has an uncertainty measure associated with it.

This is a joint work with Prof. Noel Cressie.

Amit Singer Yale University Department of Mathematics [email protected]

“The 3D Random Computed Tomography Structuring of Proteins and the Specral Non-Linear ICA Algorithm”

In this talk we present a reconstruction algorithm for the 3D atomic structure of randomly oriented proteins using modified graph laplacians. The 2003 Nobel Prize in Chemistry was co- awarded to R. MacKinnon who was the first to structure a protein channel (the potassium channel) in 1998 by crystallizing the protein and then using the classical x-ray computed tomography (CT). However, most membrane proteins cannot be crystallized, and the classical CT cannot be used. In our experimental setup, we are given real noisy 2D projections (electron microscope images) in random directions, because the proteins are randomly oriented rather than being aligned as in MacKinnon’s setup. Still, we show that the 3D reconstruction is made possible by a certain modification of the images’ graph laplacian combined with the 3D Fourier slice theorem. The reconstruction is a particular case of a more general spectral non-linear independent component analysis (ICA) algorithm that combines local PCA with the graph laplacian.

This is a joint work with Ronald Coifman, Yoel Shkolnisky and Fred Sigworth (Yale University).

Jonathan Taylor Université de Montréal Department of Mathématiques et de Statistique [email protected]

467

“Random Fields of Multivariate Test Statistics”

The starting point of our talk is a study of anatomical differences between controls and patients who have suffered non-missile trauma. We use a multivariate linear model at each location in space, using Hotelling’s T2 to detect differences between cases and controls. If we include further covariates in the model, Roy’s maximum root is a natural generalization of Hotelling’s T2. This leads to the Roy’s maximum root random field, which includes many special types of random fields: Hotelling’s T2, T, and F, so, in effect the Roy’s maximum root random field “unifies” many different random fields. This leads to the recent advances in random fields. We describe some recent advances both in the “theory” and “application” of smooth random fields, particularly the behaviour of the maximum of a smooth random field, specifically an integral- geometric “recipe” for the EC approximation; and, finally, some important recent applications of such approximations, from classical multivariate problems to perturbation models. This talk is based on joint work with Keith Worsley.

E. SUMMER PROGRAM ON MULTIPLICITY AND REPRODUCIBILITY IN SCIENTIFIC STUDIES

A. Opening Workshop July 10-12, 2006

Monday, July 10, 2006 Radisson Hotel Research Triangle Park 3rd Floor, Room H

8:15-9:10 AM Registration (Room H) Continental Breakfast (Room FG)

9:10-9:30 AM Welcome Jim Berger, SAMSI Stan Young, NISS

Session on False Discovery and Multiple Comparisons 9:30-10:30 AM “False Discovery Control” Larry Wasserman, Carnegie Mellon University

10:30-11:00 AM Coffee Break (Room FG)

11:00-12:00 PM “Inferences on the Proportion of Non-Null Effects in Large-Scale Multiple Comparisons” Jiashun Jin, Purdue University

12:00-1:15 PM Lunch (Room FG)

Session on Subgroup Analysis and High Dimensional Data 1:15-2:15 PM “Identifying Meaningful Patient Subgroups via Clustering – Sensitivity Graphics” Robert Obenchain, Eli Lilly and Company

468

2:15-3:15 PM “Large p Small n Asymptotics for Statistical Analysis of High Dimensional Data” Michael Kosorok, University of North Carolina-Chapel Hill

3:15-3:45 PM Coffee Break (Room FG)

3:45-4:45 PM View from the Food and Drug Administration Bob O’Neill, Director of Biostatistics, U.S. Food and Drug Administration Telba Irony, Center for Devices – Comment

4:45-5:15 PM Poster Session Advertisements

6:30-8:30 PM Poster Session (Room AB, 2nd Floor)

Tuesday, July 11, 2006 Radisson Hotel Research Triangle Park 3rd Floor, Room H

8:30-9:00 AM Registration (Room H) Continental Breakfast (Room FG)

Session on Multiplicity and Reproducibility 9:00-10:00 AM “Multiplicities: Minefields for Bayesians” Donald Berry, University of Texas M.D. Anderson Cancer Center

10:00-11:00 AM “Generalizing Some FWER and FDR Procedures” Sanat Sarkar, Temple University

11:00-11:15 AM Coffee Break (Room FG)

11:15-12:15 PM “The Role of FDR Based Inference in Ensuring Reproducible Results” Daniel Yekutieli, Tel Aviv University

12:15-1:30 PM Lunch (Room FG)

Session on Multiplicity in Genomics 1:30-2:30 PM “Statistical Methods for Expression Quantitative Trait Loci Mapping” Christina Kendziorski, University of Wisconsin - Madison

2:30-3:30 PM “On Enrichment of a Gene Category for Altered Genes” Michael Newton, University of Wisconsin - Madison

3:30-4:00 PM Coffee Break (Room FG)

New Researchers Session 4:00-5:20 PM “Sharp Simultaneous Intervals for the Means of Selected Populations with Application to Microarray Data Analysis” Jing Qui, University of Missouri

“Connections Between Local FDR and Alternative Hypotheses” Ken Rice, University of Washington

469

“Some Multiplicity and Reproducibility Problems in Genome-Wide Genetic Studies” Lei Sun, University of Toronto

Wednesday, July 12, 2006 Radisson Hotel Research Triangle Park 3rd Floor, Room H

8:30-9:00 AM Continental Breakfast (Room FG)

Session on Bayesian Views of Multiple Testing 9:00-10:00AM “A Bayesian Approach to the Estimation of False Discovery Rates” Subhashis Ghosal, North Carolina State University

10:00-11:00 PM “Multiple Testing: Some Contrasts Between Frequentist and Bayesian Approaches” Susie Bayarri, University of Valencia

11:00-11:15 AM Coffee Break (Room FG)

11:15-12:15 PM “Modeling Dependence for Multivariable Experiments” Peter Hoff, University of Washington

12:15-1:15 PM Lunch (Room FG)

1:15-2:30 PM Discussion of Working Group on Multiplicity: Peter Mueller (Moderator), University of Texas M.D. Anderson Cancer Center Telba Irony, U.S. Food and Drug Administration Helmut Finner, German Diabetes Center

2:30-3:45 PM Discussion of Working Group on Subgroup Analysis: Juliet Shaffer (Moderator), University of California - Berkeley Alok Krishen, GlaxoSmithKline Gary Rosner, University of Texas M.D. Anderson Cancer Center S. Sivaganesan, University of Cincinnati

3:45-4:15 PM Coffee Break (Room FG)

4:15-5:30 PM Discussion of Working Group on Replicability: Stan Young (Moderator), NISS Steve Goodman, John Hopkins University Valen Johnson, University of Texas M.D. Anderson Cancer Center

Speaker Abstracts

Susie Bayarri University of Valencia Department of Statistics [email protected]

“Multiple Testing: Some Contrasts Between Frequentist and Bayesian Approaches”

470

Donald Berry University of Texas M.D. Anderson Cancer Center Department of Biostatistics and Applied Mathematics [email protected]

“Multiplicities: Minefields for Bayesians”

Multiplicities such as multiple comparisons, subset analyses, and model selection are well understood by Bayesians--or so they think! The Bayesian approach is inherently synthetic, and that’s wonderful. But this positive attribute makes it subject to multiplicity biases that are poorly understood by Bayesians, including by me. Something as natural to a Bayesian as using historical information is fraught with multiplicities--and danger! Frequentist analyses are also subject to multiplicities, but since frequentists focus on individual experiments, associated multiplicities are both better understood and easier to handle. Bayesians who address only those multiplicities that are recognized by frequentists as being important are missing more insidious and more important multiplicities. A mantra of the Bayesian approach is that since conclusions do not depend on the experiment’s design--or at least, they depend much less than do frequentist conclusions--the Bayesian approach is more flexible and results can be analyzed in settings where frequentists have to throw up their hands. But there are mines along the Bayesian way that don’t exist for frequentists. I will describe some of these mines and suggest how we might be able to defuse them.

Subhashis Ghosal North Carolina State University Department of Statistics [email protected]

“A Bayesian Approach to the Estimation of False Discovery Rates”

False discovery rates (FDR) are important measures of errors related to multiple testing problems. We argue that a Bayesian model for estimating FDR with a Dirichlet mixture of beta prior on p-values is natural. We describe an algorithm for computing the posterior distribution. Some theoretical justification of the beta mixture model through posterior consistency will be given. Simulation studies will be presented to show that the Bayesian estimator gives more accurate estimates than the procedures proposed in the literature so far. We shall also illustrate the proposed method by an application to a leukemia data set.

Peter Hoff University of Washington Department of Statistics and Biostatistics [email protected]

“Modeling Dependence for Multivariable Experiments”

We discuss a model-based approach to identifying clusters of objects based on subsets of attributes, so that the attributes that distinguish a cluster from the rest of the population may depend on the cluster being considered. The method is based on a Polya urn cluster model for multivariate means and variances, resulting in a multivariate Dirichlet process mixture model. This particular model-based approach accommodates outliers, a variety of data types

471

(continuous, count and binary) and allows for the incorporation of application-specific data features into the clustering scheme. For example, in an analysis of genetic CGH array data we are able to design a clustering method that accounts for spatial dependence of chromosomal abnormalities.

Jiashun Jin Purdue University Department of Statistics [email protected]

“Inferences on the Proportion of Non-null Effects in Large-scale Multiple Comparisons”

The immediate need for effective massive data mining gives rise to a recent new field in statistics: large-scale multiple simultaneous testing or multiple comparisons, in which one test thousands or even millions of hypotheses simultaneously. A problem of particular of interest is the inference on the proportion of non-null effects: i.e. the proportion of hypotheses that are untrue.

In this talk, we consider a situation where the proportion of non-null effects is very small, and study (a) how to reliably tell the proportion is truly zero or not, and (b) how to precisely estimate the proportion. We introduce three recent inference tools: the Higher Criticism, Meinshausen and Rice’s confidence lower bound, and Cai, Jin, and Low’s confidence lower bound. Among them, Higher Criticism is for testing whether the proportion is truly zero or not, and the other two are for estimating the proportion. We discuss how these tools are intellectually connected, and compare their strengths and weaknesses.

This talk is based on some recent works in collaboration with (alphabetically) Tony Cai, David Donoho, Mark Low, Jie Peng, and Pei Wang.

Cai, T. and Jin, J. and Low, M. (2005). Estimation and Confidence Sets For Sparse Normal Mixtures. accepted by Ann. Statist.

Donoho, D. and Jin, J. (2004). Higher Criticism for Detecting Sparse Heterogeneous Mixtures. Ann. Statist., Vol 32, No. 3, 962-994.

Meinshausen, N. and Rice, J. (2004). Estimating the Proportion of False Null Hypotheses among a Large Number of Independently Tested Hypotheses. Ann. Statist., Vol 34, No. 1, 373-393.

Christina Kendziorski University of Wisconsin-Madison Department of Biostatistics [email protected]

“Statistical Methods for Expression Quantitative Trait Loci Mapping”

472

The development of statistical methods for mapping quantitative traits has received considerable attention. Effective methods now exist to account for different types of crosses or family structures, different kinds of phenotype, the presence of multiple genes affecting the trait and their genetic interactions, and the multiple testing issues that arise from tests at many markers. A number of groups have recently applied these QTL methods to the problem of mapping mRNA abundance measurements generated from microarrays by considering each individual transcript as a quantitiative trait. However, most QTL mapping methods were developed to address the case where a small number of traits (oftentimes, just one) are being mapped. In expression QTL (eQTL) mapping, thousands of traits are considered simultaneously and the repeated application of individual tests is not the most efficient strategy. Multiple tests across transcripts are not accounted for, and information common across transcripts is lost. I will present an empirical Bayes modeling approach to enable eQTL mapping. The inefficiency of the single trait method and the utility of the proposed method are demonstrated using microarray and genotype data from an F2 mouse cross in a study of diabetes.

Michael Kosorok University of North Carolina-Chapel Hill Department of Biostatistics [email protected]

“Large p Small n Asymptotics for Statistical Analysis of High Dimensional Data”

False discovery rate (FDR) techniques are proving to be extremely useful in microarray studies, image analysis, high throughput molecular screening, astronomy, and in many other applications involving high dimensional outcome data. Consider, for example, a cDNA microarray study where p-values are computed for each of p genes using data from n arrays. For FDR methods to be valid for identifying differentially expressed genes, it is necessary that the p-values for the non-differentially expressed genes simultaneously have uniform distributions marginally. While this is feasible for permutation based p-values, it is unclear whether this also holds for p-values based on asymptotic approximations or based on post-normalized data. The issue is that the number of p-values involved goes to infinity and intuition suggests that at least some of the p- values should behave erratically. We examine this neglected issue when n is allowed to increase slowly and p is allowed to increase almost exponentially relative to n. We show the somewhat surprising result that the p-values, under very general dependency structures and for a variety of marginal test statistics and normalization procedures, are indeed simultaneously valid in a manner which allows accurate control of FDR. We apply this result to establish validity of a least-absolute-deviation method for normalization and significance analysis which is robust to contamination in the expression levels. The practical utility of the proposed method is demonstrated with an analysis of human placenta cDNA microarray data.

Michael Newton University of Wisconsin Department of Statistics [email protected]

“On Enrichment of a Gene Category for Altered Genes”

Consider a microarray study comparing gene expression between two cellular conditions. Having scored genes for differential expression (DE), the investigator wishes to identify

473

categories of genes that are enriched for DE. Gene Ontology (GO) categories provide a case in point; each GO category is a collection of genes associated with a specific biological concept. Category information is critical in the integration of microarray data with other genomic data. Typically, Fisher’s exact test or Pearson’s chi-square test are used to assess independence of the GO categorization and the DE status. An alternative conditional testing procedure uses quantitative DE scores rather than binary test results. Binary scoring seems to be inferior because of information loss, but analysis reveals that each approach has a domain of superiority in the parameter space. (Key parameters include the average magnitude of the DE effect per gene, and the magnitude of the enrichment.) The result has connections to the comparison of model selection and model averaging. Further, empirical findings demonstrate the utility of these approaches in a microarray study of nasopharyngeal carcinoma, where gene sets are found related to the immune system’s association with Epstein-Barr viral gene expression. Finally, an algorithm is developed and analyzed which bounds the posterior probability of category enrichment starting with gene-specific q-values. I will highlight problems caused by both the multiplicity of genes and the multiplicity of GO categories.

Robert L. Obenchain Eli Lilly and Company Outcomes Research, US Medical [email protected]

“Identifying Meaningful Patient Subgroups via Clustering - Sensitivity Graphics”

The Local Control (LC) approach to estimation of treatment effects on humans is based upon a treatment-within-cluster nested ANOVA model that is frequently less restrictive and more robust than traditional Covariate Adjustment (CA) models. Because LC systematically forms subgroups, compares subgroups and (following overshooting) recombines subgroups, LC proceeds via built-in sensitivity analyses. Specifically, the analyst views graphical displays that identify the most effective treatment in each patient subgroup within a family generated by varying the number of clusters, the clustering metric and/or the clustering (unsupervised learning) algorithm. The LC approach uses generalized definitions of Treatment Effects, TEs, and main effects. TEs are distributions of local main effects, and the overall Main Effect (ME) of treatment is the unknown true mean of all such local TE distributions. It is then natural to ask whether [a] an observed TE distribution could be a mixture of two or more well defined sub- distributions and whether [b] it is possible to predict the numerical size or sign of local main effects directly from the baseline patient X-characteristics used to define clusters. Rather than making strong, potentially unrealistic assumptions via generalized linear CA models, the LC approach can provide robust yet powerful insights into all sorts of head-to-head treatment comparisons processing information from sources ranging from massive administrative claims databases to highly restrictive, well-controlled clinical trials.

Jing Qiu University of Missouri-Columbia Department of Statistics [email protected]

“Sharp Simultaneous Intervals for the Means of Selected Populations with Application to Microarray Data Analysis”

474

Simultaneous inference is a challenge when the number of comparisons, N, is large. In some situations, such as microarray experiments, the means of N populations are estimated. However, we are only interested in the means of K populations that have the most extreme estimates and construct simultaneous confidence intervals for the population means of the K selected populations. The naïve simultaneous confidence intervals for the K means (applied directly without taking into account the selection) have low coverage probabilities. We take an empirical Bayes approach (or an approach based on the random effect model) to construct simultaneous confidence intervals with good coverage probabilities. For N=10,000 and K=100, typical for microarray data, our confidence intervals could be 77% shorter than the naive K-dimensional simultaneous intervals.

Kenneth Rice University of Washington Department of Biostatistics [email protected]

“Connections Between Local FDR and Alternative Hypotheses”

The local False Discovery Rate was introduced by Efron and co-authors (2001, 2005). However, Bayesians will recognize it as the posterior probability that data came from the ‘null’, if we specify a model with ‘null’ and ‘alternative’ components. In yet another field, such ‘contaminated’ distributions underpin robust analysis, introduced by Huber (1964), where the aim is to estimate properties of a parametric distribution in the presence of ‘outliers’. To connect these fields, we consider the traditional Huber-style estimates in the newer local FDR framework. This provides a new way to interpret Huber’s approach, as a particular form of outlier-distribution. More generally, it also suggests attractive properties for likelihood-based modeling of situations where part of the data is not null.

Sanat Sarkar Temple University Department of Statistics [email protected]

“Generalizing Some FWER and FDR Procedures”

The traditional notion of familywise error rate (FWER) has been generalized to that of k-FWER and procedures controlling it have been developed in the literature to accommodate situations in multiple testing where one is willing to tolerate a few false rejections and wants to control k or more of them, for some fixed k ≥ 1. Some newer k-FWER procedures often more powerful than those proposed recently in the literature will be introduced in this talk. In addition, an alternative and less conservative notion of error rate, the k-FDR, a generalization of the usual false discovery rate and defined in the same spirit as the k-FWER, will be introduced and procedures that control it will be presented.

Lei Sun University of Toronto Departments of Public Health Sciences and Statistics [email protected]

475

“Some Multiplicity and Reproducibility Problems in Genome-Wide Genetic Studies”

Current high-throughput genotyping technology provides efficient means to collect vast amounts of data with low cost. However, it also poses many interesting and challenging statistical and computational problems, in particular the problem of multiple hypothesis testing and difficulty in replicating initial claims of gene discoveries.

Using a recent Genome-Wide Association (GWA) study of Parkinson’s disease by Maraganore et al. (2005) as an example, in which 198,345 SNPs were tested for association and 1,906 identified with p-values < 0.01, I will present our recent work on the control of type II error rate and a stratified false discovery control approach in the context of multiple hypothesis testing (Craiu and Sun, 2006; Sun et al., 2006). I will also propose some resampling-based methods to address the issue of overestimating the importance of genetic effect which has significant consequences in the design and analysis of replication studies (Sun and Bull, 2005).

References:

Maraganore DM, de Andrade M, Lesnick TG, Strain KJ, et al. (2005). High-resolution whole-genome association study of Parkinson disease. American Journal of Human Genetices 77:685-693.

Craiu RV, Sun L (2006). Choosing the lesser evil: trade-off between false discovery rate and non-discovery rate. Statistica Sinica (to appear).

Sun L, Craiu RV, Paterson AD and Bull SB (2006). Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genetic Epidemiology (in press).

Sun L, Bull SB (2005). Reduction of selection bias in genomewide genetic studies by resampling. Genetic Epidemiology 28:352-367.

Larry Wasserman Carnegie Mellon University Department of Statistics [email protected]

“False Discovery Control”

I will introduce false discovery control methods. These are testing methods that control the false discovery proportion (FDP) which is the number of false rejections divided by the number of rejections. I begin with the Benjamini and Hochberg method which controls the false discovery rate (FDR), that is, the mean FDP. Then I will discuss numerous extensions that have proposed on the basic method. These include: dealing with dependence, power enhancements, weighted methods, controlling quantiles of the FDP, and controlling FDP in spatial problems.

476

Daniel Yekutieli Tel Aviv University Department of Statistics and Operations Research [email protected]

“The Role of FDR Based Inference in Ensuring Reproducible Results”

FDR controlling methods have been found to be very successful in providing reproducible results when applied to a single, very large, family of hypotheses. In this talk I will show how FDR control can be applied in more general settings: large-scale studies which involve several research directions with multiple families of tested hypotheses. I will then focus on the discoveries – the inference provided by FDR controlling methods, and on the FDR property itself, and try to explain why FDR-significant results can be considered reproducible results. I will present inherent weaknesses of FDR methodology, and try to explain why I think that FDR methodology and Bayesian inference may be incompatible.

Poster Abstracts

Delong Liu and Kevin W. Gaido CIIT Centers for Health Research [email protected]

“Modeling Effect of Nucleotide Compositions on Expression Levels of Probes Due to Non- specific Binding on Affychips”

Background: Affy GeneChips (Affychips) have been widely used in genome research. Large signal intensity variation of genes with low expression levels across a probe set on an Affychip has raised concerns about quality control in analysis of gene expression data. One way to address this issue is to study the effect of nucleotide composition of a probe on non-specific binding measured from its background signal intensity (at 0 pM concentration of its target cRNA).

Results: We proposed a linear model to approximate the contribution of nucleotide composition of a probe to its signal intensity due to non-specific binding. The model includes three sets of predictors: the contribution of each nucleotide on a probe; the contribution of linearly-dependent position of each nucleotide; and the contribution from two adjacent nucleotides on a probe. We applied the linear model to measured background signal intensities of 498 perfect match (PM) probes of 42 spike-in genes at 0 concentration of target cRNAs on Affy HG-U133 chips, and fitted the linear model using Least Angle Regression (LARS) method. We then used the trained model to predict the background intensities of all the PM probes on the spike-in chip. The predicted high background intensities correlate with the observed high signal intensities for the PM probes clustered in one direction on the spike-in chips.

Conclusion: Our analysis on the 498 spike-in PM probes suggests that different nucleotide compositions within a probe set can lead to large variation in background intensities. Spatial positions of nucleotides on a probe should be considered in modeling non-specific binding on Affychips.

Fei Liu, Feng Liang, and Woncheol Jang

477

Duke University Institute of Statistics and Decision Sciences [email protected]

“A Bayesian Method for Partially Paired Gene Expression Data”

Microarray technology is often applied in controlled studies where measurements are simultaneously taken for thousands of gene expressions on matched pairs. For completely paired data, Bayesian methods provide a natural solution with small sample size. However, the Gene expression data often have missing values which causes part of the observations unmatched. For the small sample size as in the context of gene expression data, excluding those unpaired observations for the analysis may lead to significant information loss. We generalize the Bayesian method for partially paired gene expressions, which provides a natural way to incorporate the information from the unmatched observations for a better estimation.

Vered Madar Tel-Aviv University Department of Statistics and Operations Research [email protected]

“Simultaneous Confidence Intervals for Multiple Parameters with More Power to Determine the Sign”

We offer new simultaneous two-sided confidence intervals for the estimation of the expectations of k normal random variables. We invert a family of rectangular acceptance regions that have minimal incursion to the other quadrants into a confidence set, and project its convex-hull onto the axes to get simultaneous confidence intervals. Besides offering simultaneous coverage, the new intervals also provide slightly stronger sign classification than offered by the simultaneous conventional two-sided intervals. As we shall illustrate, the new intervals are very useful for making inference in clinical trial that has several primary endpoints.

James Scott Duke University Institute of Statistics and Decision Sciences [email protected]

Title and Abstract TBD

Song Zhang University of Texas M.D. Anderson Cancer Center Department of Biostatistics and Applied Mathematics [email protected]

“A CAR-BART Model to Merge Two Datasets”

One problem frequently encountered by public health researchers and health planners is the absence of socioeconomic data in many widely used and routinely collected sources of health and disease information. A common practice to solve this problem is to supplement individual- level record with the socioeconomic profile of the immediate neighborhood of the individual’s

478

residence. In this study, we are interested in the relationship between self-perceived health status and income, but they are only available in two different datasets. A Bayesian hierachical model is built to merge two datasets. We extend the Bayesian additive regression trees (BART) model by incorporating additional spatial effects. A simulation study and real data analysis are presented.

Xinge Zheng Purdue University Department of Statistics [email protected]

“Higher Criticism for Detecting Sparse Heterogeneous and Heteroscedastic Mixtures”

Higher Criticism is a statistics recently proposed in [1], which concerns the situation of detecting very sparse signals. The regime of very sparse signals is of great interest, as on one hand it exposes many new phenomena not present in the non-sparse situation, and on the other hand many conventional detection tools cease to work well in this regime. Despite the subtlety of the situation, Higher Criticism is found to be effective and it is in fact showed in [1] to be optimal in detecting heterogeneous (but homoscedastic) normal mixtures.

In this paper, we extend the study to the heteroscedastic situation, and study whether the optimality of Higher Criticism continues to hold. In detail, we consider n samples from a normal iid 2 mixture X i ~(1 − ε n )N(0,1) + ε n N(μ n ,σ ) , and are interested in testing ε n = 0 vs. ε n ≠ 0. −β Similarly to [1] we calibrate ε n = n and μ n = 2r log n (1/2 < β < 1 and 0 < r < 1) so that the situation is subtle but still detectable. We are particularly interested in the influence of the heteroscedastic parameter σ on the testing problem.

We describe the precise demarcation between detectable and undetectable: for which (β, r, σ),

asymptotically it is possible to reliably tell ε n < 0 and for which it is impossible to do so. We found that, surprisingly, despite the fact that different σ give dramatically different demarcation, Higher Criticism continues to be optimal for detection and adaptive to different choices of σ.

References [1] Donoho, D. and Jin, J. (2004). Higher Criticism for Detecting Sparse Heterogeneous Mixtures. Ann. Statist., Vol 32, 3, 962-994.

F. EDUCATION & OUTREACH (2006-07)

A. Industrial Mathematical and Statistical Modeling Workshop for Graduate Students July 24-August 1, 2006

Sunday, July 23 Arrival 7:00pm Welcome greet & eat for students in Sullivan Lounge (Subs from Two Guys)

Monday, July 24

479

7:30 - 8:30 Breakfast (Fountain Dining Hall, or on own)

8:30 - 9:00 Coffee and soft drinks Harrelson Hall, Room 245

9:00 – 12:30 Problems presented to whole group Harrelson Hall, Room 314

12:30 - 1:30 Lunch

1:30 - 5:00 Working Session

5:30 – 7:00 Pizza Party at Two Guys Restaurant Hillsborough Street

Tuesday, July 25 AM & PM Working Session

Wednesday, July 26 AM & PM Working Session

Thursday, July 27 AM & PM Working Session

Friday, July 28 AM Working Session

PM Tour of Centennial Campus & Math Lab

Saturday, July 29 AM Working Session

11:15 Pick up at dorm for Picnic at Lake Crabtree (Barbecue, Southern Style)

5:30 Van pick up for Baseball Game Durham Bulls Athletic Park (Durham Bulls vs. Rochester Red Wings)

Sunday, July 30 Free time

Monday, July 31 AM & PM Working Session

Tuesday, August 1 AM Working Session

1:00 Formal presentation of results (Harrelson Hall, Room 314)

480

6:00 - 9:00 Dinner at the University Club

Problem Presentation Session Monday July 24, 2006 Harrelson Hall, Room 314

9:00-9:15 Opening Remarks Prof. Mansoor Haider, Chair, IMSM 2006 Prof. Loek Helminck, Head, Department of Mathematics, NCSU Prof. Ralph Smith, Associate Director, Statistical & Applied Mathematical Sciences Inst., Associate Director, Center for Research in Scientific Computation, NCSU

9:20-9:40 Problem 1 – Website Volume Prediction Carrie Ward, Advertising.com

9:45-10:05 Problem 2 – Optimal Implied Views for Skewed Return Distributions Randy Miller, Bank of America

10:10-10:30 Problem 3 – Bias Modeling in State Vector Estimation Jon Kennedy, MIT-Lincoln Lab

10:30-10:45 Break & Refreshments – Harrelson Hall, Room 245

10:50-11:10 Problem 4 – Iterative Refinement Method for a Planning Problem Involving Resource Constraints Pierre Maldague, Jet Propulsion Laboratory

11:15-11:35 Problem 5 – Properties of a Gradient Descent Algorithm for Active Vibration Control Mark Jolly, Lord Corporation

11:40-12:00 Problem 6 – Analysis of Biological Interaction Networks for Drug Discovery Laura Potter and Chetan Gadgil, GlaxoSmithKline

Problem 1. Website Volume Prediction Presenter: Carrie Ward, Advertising.com 1020 Hull Street Baltimore, MD 21230 p. (410) 244-1370 x11369 [email protected] Team Members: Richard Barnard, Paul Diver, Roxana Hritcu, Asuman Turkmen, Joe Zhang, Gang Zhao Faculty Consultants: Amy Langville, Zhilin Li Working Session: Room 345 Harrelson

481

Advertising.com’s extensive network of website placements and advertisements allows for unrivaled reach in the market of online advertising. Additionally, Advertising.com is able to offer strategic targeting as well as real-time optimization of ad placements in order to meet specific client objectives. At the core of the technology that makes this possible is a mathematical algorithm that determines optimal placements based off of observed and expected response, while maintaining given constraints such as budget, targeting, and user frequency. In order to allocate advertisements accordingly, accurate estimates for the volume on the available web space in the network is a necessity. There are various challenges in that regard, as there are many factors that can influence volume trends. For example, website publishers may often sell their space directly to advertisers or to other networks, which can cause observable state changes in volume behavior. Another main contributing factor to volume fluctuations is caused by natural changes in time, such as time of day or day of week behavior. The goal of the workshop is to identify volume prediction methods that encompass both the natural tendencies and changes in publisher behavior, in order to provide an accurate representation of volume on the network. This can be achieved by simulating data and modeling specific “stress test” scenarios. Lastly, a set of alarms can be developed for when it is believed that a change in steady behavior is observed. The desired areas of interest include probability, statistics, programming, control theory, and signal processing.

Problem 2. Optimal Implied Views for Skewed Return Distributions Presenters: Randy Miller SVP, Global Portfolio Strategies Bank of America NC1-002-04-21 101 S. Tryon St. Charlotte, NC 28255 p. (704) 386-5404 [email protected] Team Members: Robertas Gabrys, Weihua Geng, Rachel Koskodan, Jing Li, Luke Owens, Qi Wu, Miao Zuo Faculty Consultants: Tao Pang, Jeff Scroggs Working Session: Room 130 Harrelson

Background The Black-Litterman model provides a method for introducing portfolio manager views into an optimal portfolio construction model.1 The optimal solutions under manager views are anchored to a benchmark portfolio. The model finds optimal portfolio weights that deviate from the benchmark weights based on portfolio manager views on expected returns relative to benchmark returns, and on the degree of confidence in the managers’ views. Optimal portfolios are derived within a conventional mean-variance framework, i.e., optimal portfolio weights result from an optimal balance of expected returns and risk where risk is measured as return variance- covariance. The Black-Litterman model also can be solved in reverse to determine “implied views” on returns associated with a given set of portfolio weights relative to benchmark weights. In other words, one can determine the expected returns required for the observed actual portfolio weights to be optimal within the Black-Litterman framework. In practical applications, implied views on expected returns are a powerful way of representing the implications of observed portfolio weights. They highlight circumstances where the implied expected returns deviate too

482 far from expected return norms, and accordingly, identify relatively extreme positions in portfolios and potential opportunities for portfolio rebalancing. Credit portfolio (loans, bonds, etc.) return distributions exhibit fat tails and typically are highly skewed. Variance-covariance may not be a satisfactory risk metric for such asymmetric portfolios. Alternative risk metrics (value at risk, conditional value at risk, expected shortfall) have been proposed that typically focus more on the tail risk of credit portfolios. In addition, methods have been devised for constructing optimal portfolios using the alternative tail risk metrics, or by taking skewness and kurtosis directly into account during optimal portfolio construction.2 The solutions to optimizations with tail risk metrics generally differ materially from those based on variance- covariance risk metrics. 1 Black, F. and Litterman, R., “Asset Allocation: Combining Investor Views with Market Equilibrium”, Working Paper, Goldman Sachs, Fixed Income Research, 1990. (There are many other descriptions of the B-L model in textbooks and other academic papers.) 2 Rockafellar, R.T. and Uryasev, S., “Optimization of Conditional Value at Risk, mimeo, September 5, 1999; and related papers.

Problem The problem we wish to examine is whether it is possible to reverse solve and derive implied expected return views for optimization models based on tail risk metrics. In other words, for a given set of portfolio weights, is it possible to derive implied returns consistent with a targeted value for a tail risk metric. We want to specifically target the optimization model based on conditional value at risk minimization.3 Alternative approaches include, but are not limited to, the following: 1. Derive an explicit, computationally feasible solution within the model framework as specified by Rockafellar and Uryasev. 2. Derive an optimal benchmark portfolio under a tail risk measure optimization. Derive the variance-covariance matrix for the benchmark portfolio, and then apply the Black-Litterman implied view methodology in the conventional way. 3. Derive implied views from some form of a minimize tracking error model formulation. 4. Derive implied views from an optimization model that jointly accounts for the second, third, and fourth moments of the return distribution (Martellini, et.al) 3. Martellini, L., Matheiu, V., and Zeimann, V., “Investing in Hedge Funds: Adding Value Through Active Style Allocation Decisions,” Edhec Risk and Asset Management Research Centre, October 2005. 3IBID., Rockafellar and Uryasev, 1999.

Problem 3. Bias Modeling in State Vector Estimation Presenter: Jon C. Kennedy Space Control Systems Group MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420-9108 (781) 981-1268 [email protected] Team Members: Alvaro Guevara, Mandar Kulkarni, Wanying Li, Wenjin Mao, Brent Morowitz, Nebojsa Murisic Faculty Consultants: Alina Chertock, Negash Medhin Working Session: Room 358 Harrelson

483

The estimation of six-dimensional state vectors from angle only optical observations is critically important in areas ranging from missile defense to space surveillance. Given data corrupted by N(0, σ2 ) distributed noise, the methodology behind calculating an optimal estimate is well known. However, real sensors often provide observations with temporal and spatial biases which need to be dealt with in the fitting process. One way to do this is by augmenting the state vector with degrees of freedom to absorb the effect of the bias. What is not clear is exactly how one writes down the model for these parameters in general and how it fits into the canonical least squares state vector estimation. This project would concern the detailed modeling of these extra degrees of freedom and their optimal estimate.

Problem 4. Iterative Refinement Method for a Planning Problem Involving Resource Constraints Presenter: Pierre F. Maldague Mail Stop 301-250D Jet Propulsion Laboratory 4800 Oak Grove Dr. Pasadena, CA 91109 p. (818) 354-0152 [email protected] Team Members: Sergei Belov, Ye Chen, Aneesh Hariharan, David King, Jenny Law, Ting Wang Faculty Consultants: Kartik Sivaramakrishnan Working Session: Room 352 Harrelson

Summary of Problem. An idealized planning problem is presented, in which a set of activities is to optimize an objective function while satisfying a set of constraints expressed in terms of one or several numerical resources . The problem consists in evaluating the merits of an iterative approach in which the resource values are approximated by integers in the interval [0, 2B−1], where the discretization parameter B controls the accuracy of the approximation at each stage of the approach. The case B = 1, although far from trivial, lends itself to efficient heuristics that produce good results. The problem with this case is that the resources are modeled very crudely, since each resource is restricted to the values 0 and 1. As B increases, resources are modeled more accurately but the problem grows in complexity, due to the difficulty of producing acceptable heuristics while keeping computation time under control. The challenge is to document the tradeoff between accuracy and complexity.

Statement of the problem. The context for this problem is the task of planning activities to be performed by a spacecraft as part of a space mission. A mission planner is expected to maximize the science return, while staying within the constraints that ensure safety of the spacecraft and success of the mission. This problem can be idealized as follows. The task is to arrange a number of activities so as to maximize a given objective function while satisfying a set of resource constraints. For our purpose, a resource is a numeric quantity that is a function of time, time being defined as any real number within the planning horizon [S,E] where S is the start of the plan and E is the end of the plan. In the absence of activities, a resource R has a constant value R0 which is referred to as the default value of the resource. In our model, an activity is represented by a time interval contained within the planning horizon. The time interval corresponding to activity A is written as

484

[SA,EA]. The duration of activity A is EA − SA. An activity can impact a resource R in two distinct ways, depending on whether R is consumable or non-consumable: • a consumable resource (e. g. fuel or data volume) is depleted by an amount rA at the start of an activity A that uses it. If A is the only activity in the plan, the resource value R(t) will be given by R0 for t < SA and R0 − rA for t ! SA • a non-consumable resource (e. g. power or data rate) is depleted by an amount rA at the start of an activity A that uses it, and replenished by an equal amount at the end of the activity. If A is the only activity in the plan, the resource value R(t) will be given by R0 for t < SA, by R0 − rA for SA! t < EA, and by R0 for t ! EA. When several activities exist in the plan, their effect on the resource history R(t) is defined as the sum of their individual effects. The objective function V, which measures the scientific value of an activity in our model, is similar to a consumable resource, but activities contribute a positive amount v to V. In this problem, we will assume that we are given M resources{Rm|1 !m!M}, each one with its default value R 0 m . The relationship between an activity and the set of M resources and the objective function V can be formalized in terms of an activity type T, which consists of the following: • a duration DT ! 0 • a science value vT ! 0 • a set of M quantities { r T m |1 !m!M} where r T m represents the amount of resource Rm used by the activity of type T We can then say that an activity A is of type T if the following is true: • EA − SA = DT • the science value of activity A is vT • the amount of resource Rm used by A is r T m With these preliminaries, we can formulate the planning problem as follows. Given M resources {Rm}, N activity types {Tn} and a planning horizon [S,E], compute a plan P, i. e., a set {Ai| i !I } of activities, such that • S ! S Ai ! E Ai ! E , i. e., each activity fits in the planning horizon • each activity Ai is of some type Ti • each resource Rm stays positive over the planning horizon • the objective function V is as large as possible at the end of the plan It is very hard to solve this problem exactly because of the large number of possibilities that have to be examined. To simplify the problem, we will constrain the values of the M resources to integers in the range {0, 1, · · · , 2B −1} where B is a discretization parameter. Note that in that case we must similarly restrict the resource usage values r T n m to lie in the same interval. The problem in its final form is then to carry out the following three steps: 1. formulate a heuristic approach that does a good job for small values of B (certainly B = 1 and 2 should be looked at in some detail) 2. extrapolate as much as you can towards higher values of B 3. state any conclusions that you have reached in terms of (a) how quickly the complexity of the problem grows with B, and (b) how well a problem with a given B can be approximated by a similar problem with B’< B

Problem 5. Properties of a Gradient Descent Algorithm for Active Vibration Control Presenter: Mark R. Jolly Lord Corporation Thomas Lord Research Center 110 Lord Dr., Cary, NC 27511 p. (919) 469-2500 x2335 f. (919) 460-9648 [email protected]

485

Team Members: Jeff Baker, Ivan Christov, Simona Dediu, Elana Fertig, Wanda Strychalski, Liming Wang Faculty Consultants: H.T. Banks, Shuhua Hu, Ralph Smith Working Session: Room 129 Harrelson

In the past six years, Lord Corporation has developed and commercialized active vibration control systems for helicopters. These systems consist of inertial force generators, vibration sensors, and a controller. The controller employs an adaptive gradient descent algorithm to produce commands for the force generators such that a cost function based on the vibration sensors is minimized. Such algorithms, sometimes referred to as the least-mean-square (LMS) algorithms, are well understood in terms of performance and convergence properties. A new type of force generator has been developed that has required the development of a new type of adaptive gradient descent algorithm. While these algorithms have proven effective, little is understood about their convergence properties. This project will entail developing an understanding of this algorithm and its convergence properties through analyis and simulation.

Problem 6. Analysis of Biological Interaction Networks for Drug Discovery Presenters: Laura Potter Chetan Gadgil Scientific Computing and Mathematical Scientific Computing and Mathematical Modeling Modeling GlaxoSmithKline GlaxoSmithKline 5 Moore Drive, P.O. Box 13398 709 Swedeland Road Research Triangle Park, NC 27709 King of Prussia, PA 19406 p. (919) 483-2161 p. (610) 270-669 [email protected] [email protected] Team Members: Aditi Baker, Suzanne Harvey, Minkyung Jung, Chung-min Lee, Inga Maslova, Maureen Morton, Jian Wang Faculty Consultants: Mette Olufsen Working Session: Room 346 Harrelson

Many physiological processes involve complex signal transduction cascades, in which large numbers of species interact with each other. One of the critical steps in the drug development process is the identification of a "target," a molecular species in the body whose functioning can potentially be altered by dosing with an external substance. The target is just one component of a huge signal transduction network, and it is critical to understand the potential impact of interventions at the target and any drug induced changes on the overall network. One common approach to validating potential targets is to "knock out," or functionally remove the target from an animal model and then measure the resulting effects on the biology of interest. With mathematical models of signal transduction networks, an in silico version of target validation can be conducted by computationally removing a species and predicting the effects on the system. The disruptiveness of a given knock-out on the network then can be determined by examining how the knock-out changes the behavior of other species in the network. In some cases, it may be desirable to disrupt as much of the network as possible, while in other cases, only a specific region of the network is to be disrupted. As the models become large, analyzing complex signaling networks becomes increasingly difficult, so that development of general automatable analysis tools is critical. This project will focus on developing mathematical methods to: (1) quantify network properties; (2) quantify the disruptiveness of knock-outs on the network; and

486

(3) to apply the analysis techniques to a small model of signal transduction. The analysis is needed both for the case when only qualitative information about the network is available; and when a detailed mathematical model that quantitatively expresses the interactions is available. It is anticipated that graph-theoretic analysis approaches will be applicable in the first case, and dynamical analysis techniques will be used when a kinetic model is available

B. Two-Day Undergraduate Workshop November 17-18, 2006

Friday – November 17, 2006 NISS Building, Room 104

9:15 AM CSR shuttle departs StudioPLUS Hotel

9:30-10:00 Registration and Breakfast

10:00-10:10 Welcome and Introduction to SAMSI Ralph Smith, North Carolina State University Associate Director of CRSC and SAMSI

10:10-11:10 “Riemannian Geometry, Random Walks and the Statistical Analysis of Multivariate Time Series” Makram Talih, Hunter College-CUNY and SAMSI

11:10-Noon “Statistical Models for Climate Change” Serge Guillas, Georgia Tech and SAMSI

Noon-1:00 Lunch

1:00-3:00 “Numerical Experiments in Random Matrix Theory” Raj Rao, Massachusetts Institute of Technology and SAMSI

3:00-3:15 Coffee Break

3:15-4:15 “Regularization and High-Dimensional Inference” Eitan Greenshtein, North Carolina State University and SAMSI

4:15-4:30 Review Session Ralph Smith, North Carolina State University and SAMSI

4:30 CRS shuttle back to StudioPLUS Hotel

5:20 CRS shuttle to Park Diner Restaurant

5:30-7:00 Dinner at Park Diner Restaurant

7:00 CSR shuttle back to StudioPLUS Hotel

487

Saturday – November 18, 2006 NISS Building, Room 104

8:30 CSR shuttle departs StudioPLUS Hotel

8:45-9:15 Arrival at SAMSI and Breakfast

9:15-10:00 “Some Uses of Random Matrix Theory in Statistics” Noureddine El Karoui, University of California-Berkeley and SAMSI

10:00-11:00 “Rare Events in Nonlinear Lightwave Systems” Elaine Spiller, Duke University and SAMSI

11:00-11:15 Coffee Break

11:15-Noon “Random Matrices in Multivariate Statistical Analysis” Donald Richards, Penn State University and SAMSI

Noon Adjournment and Departure CRS shuttle will depart SAMSI for the airport. If you need transportation to return to the hotel, please see a member of the SAMSI staff.

C. Two-Day Undergraduate Workshop March 2-3, 2007

Friday – March 2, 2007 NISS Building, Room 104

9:15 Charlene’s Safe Ride (CSR) departs Crestwood Suites for SAMSI

9:30-10:00 Arrival at SAMSI and Breakfast

10:00-10:10 Welcome and Introduction to SAMSI Ralph Smith, North Carolina State University Associate Director of CRSC and SAMSI

10:10-11:10 “Overview Talk on Development, Assessment and Utilization of Computer Simulators” Tom Santner, Ohio State University

11:10-12:00 “An Introduction to Deterministic Epidemiological Models” Ariel Cintron-Arias, SAMSI

12:00-1:00 Lunch

1:00-2:00 “Models of Aquatic Ecosystems” Peter Reichert, EAWAG/ETH Switzerland and SAMSI

488

2:00-2:15 Coffee Break

2:15-4:15 Hands-on Computer Simulation of Biogeochemical and Ecological Processes in Lakes

4:15-4:30 Review Session Ralph Smith, North Carolina State University and SAMSI

4:30 CRS shuttle back to Crestwood Suites

5:20 CRS shuttle to Park Diner Restaurant

5:30-7:00 Dinner at Park Diner Restaurant

7:00 CSR shuttle back to Crestwood Suites

Saturday – March 3, 2007 NISS Building, Room 104

8:30 AM CSR shuttle departs Crestwood Suites

8:45-9:15 Arrival at SAMSI and Breakfast

9:15-10:00 “Introduction to Biochemical Network Modeling” Darren Wilkinson, University of Newcastle

10:00-11:00 “Terrestrial Models” Sean McMahon, Duke University

11:00-11:15 Coffee Break

11:15-12:00 “Statistics and Climate Modeling” Cari Kaufmann, SAMSI

12:00 PM Adjournment and Departure

CRS shuttle will depart SAMSI for the airport. If you need transportation to return to the hotel, please see a member of the SAMSI

G. CO-SPONSORED AND INFORMAL MEETINGS AND WORKSHOPS

A. T-O-Y 2007 Workshop on Geophysical Models (at NCAR) November 13-14, 2006

Tutorial/Research/Expository Lectures Monday, November 13, 2006 Mesa Lab, Damon Room

489

8:00 - 8:30 Registration and Coffee 8:30 - 9:00 Welcome and Introductory remarks: Doug Nychka, NCAR Institute for Mathematics Applied to Geosciences

Derek Bingham, Dept. of Statistics and Actuarial Science, Simon Fraser University 9:00 - 10:20 Johannes Feddema, University of Kansas Gordon Bonan, NCAR Terrestrial Sciences Section/CGD Anthropogenic Land Cover Change Experiments in the CCSM 10:20 - 10:40 Break 10:40 - 12:10 Pablo Mininni, NCAR Turbulence Numerics Team

Statistical Properties of Turbulent Flows 12:10 - 1:40 Lunch 1:40 - 3:00 Hanli Liu, Art Richmond, and Michael Wiltberger, NCAR High Altitude Observatory The Upper Atmosphere: Problems in Developing Realistic Models 3:00 - 3:30 Break 3:30 - 4:50 Joshua Hacker, NCAR Research Applications Laboratory

The Planetary Boundary Layer and Uncertainty in Lower Boundary Conditions 4:50 - 5:20 Closing Remarks 5:20 - 6:00 Mixer

Tuesday, November 14, 2006 Center Green Building 1, Room 3131 8:00 - 8:20 Coffee 8:20 - 8:50 Instructions to break-out groups 8:50 - 10:30 Break-out groups discussion 10:30 - 10:45 Break 10:45 - 12:00 Groups presentations and conclusions 12:00 Adjourn

490 Appendix E – Workshop Evaluations Summary

At every SAMSI Workshop participants were given an evaluation questionnaire to complete. A sample questionnaire is at the end of this appendix. Summaries of the participant evaluations are presented below.

The evaluations of scientific content are presented in three graphs: i) SAMSI 2005-6 Program workshops, ii) Follow-on workshops to previous SAMSI Programs and iii) Student workshops, at both graduate and undergraduate levels. At least 85% of the participants’ at each SAMSI Program workshop, whether contemporaneous or follow-on, rated the scientific content Very Good to Excellent. For undergraduate workshops, the ratings were more varied, with fewer generally rating the workshops Excellent and a higher proportion rating them Good to Very Good. Judging from the undergraduates’ written comments, the satisfaction with the science of the workshops depended on the level of the individual student’s preparation as well as the quality of the workshop itself. However it is also noteworthy that some students who volunteered that the technical level of the workshop was beyond their current capability also wrote enthusiastically about their participation.

SAMSI staff and facilities for workshops have been very highly rated every since SAMSI opened its doors. The continuing satisfaction with SAMSI staff is a point of pride. Minor problems with transportation to/from SAMSI continue to receive attention to keep the workshops running smoothly. Note that undergraduates for the Interdisciplinary Undergraduate Workshop and graduates for the Industrial Mathematical, Statistical Modeling are housed on campus at NC State, so that their primary activities are within walking distance.

In 2006-7 SAMSI workshops attracted from 22 to 146 participants; several of these (both large and small) were oversubscribed for the available space and/or for the workshop goals. Keys to abbreviations on the graphical summaries follow.

2005-6 Programs CompMod: Development, Assessment and Utilization of Complex Computer Models SAMSI/NPCDS Summer School 57 participants Opening workshop 124 participants Joint Engineering & Methodology Workshop 22 participants Biosystems Modeling 62 participants SAMSI/MUCM Mid-Program 29 participants Models Terrestrial Mid-Program 16 participants RM: High Dimensional Inference and Random Matrices Opening Workshop 146 participants Bayesian Focus Workshop 49 participants Large Graphical Models Workshop 23 participants Geometry workshop 34 participants MultRep: Multiplicity and Reproducibility in Scientific Studies Opening Workshop 60 participants

491

Education & Outreach CAARMS: 12th Annual Conference for African American Researchers in the Mathematical Sciences 68 participants Undergraduates: Summer Interdisciplinary workshop 23 participants Fall Undergraduate two-day workshop 31 participants Spring Undergraduate two-day workshop 31 participants Graduate students: IMSM: Industrial Mathematical, Statistical Modeling 38 participants

Evaluation of Science at SAMSI Workshops for Complex Computer Models Program

100%

80%

60% Excellent Very Good Good 40% Fair Poor Percentage of Responses

20%

0% Opening Modeling Workshop CompMod CompMod Models Biosystems Program CompMod CompMod Workshop CompMod Mid-Program Methodology SAMSI/MUCM Engineering & Engineering Terrestrial Mid- Terrestrial SAMSI/NPCDS CompMod Joint Summer School Summer

492 Evaluation of Science at SAMSI Workshops for Random Matrices and Multiplicity and Reproducibility Programs

100%

90%

80%

70%

60% Excellent Very Good 50% Good Fair 40% Poor

Percentage of Responses 30%

20%

10%

0% RM Matrices Random Matrices RM Large RM Geometry MultRep Opening Bayesian Focus Graphical Models workshop Workshop Workshop Workshop Workshop

Evaluation of Science at SAMSI Education & Outreach Workshops: 2006-07

100%

80%

60% Excellent Very Good Good Fair 40% Poor Percentage of Responses

20%

0% Ungergrad CAARMS Undergrad Two- Undergrad Two- IMSM for Graduate Interdisciplinary Workshop Day Fall Workshop Day Spring Students Workshop Workshop Workshop

493 Evaluation of Staff at SAMSI Workshops for Complex Computer Models Program

100%

80%

60% Excellent Very Good Good 40% Fair Poor Percentage of Responses

20%

0% Opening Modeling Workshop CompMod CompMod Models Biosystems Program CompMod CompMod Workshop CompMod Mid-Program Methodology SAMSI/MUCM Engineering & Engineering Terrestrial Mid- Terrestrial SAMSI/NPCDS CompMod Joint Summer School Summer

Evaluation of Staff at SAMSI Workshops for Random Matrices and Multiplicity and Reproducibility Programs

100%

90%

80%

70%

60% Excellent Very Good 50% Good Fair 40% Poor

Percentage of Responses 30%

20%

10%

0% RM Matrices Random Matrices RM Large RM Geometry MultRep Opening Bayesian Focus Graphical Models workshop Workshop Workshop Workshop Workshop

494 Evaluation of Staff at SAMSI Education & Outreach Workshops: 2006-07

100%

80%

60% Excellent Very Good Good Fair 40% Poor Percentage of Responses

20%

0% Ungergrad CAARMS Undergrad Two- Undergrad Two- IMSM for Graduate Interdisciplinary Workshop Day Fall Workshop Day Spring Students Workshop Workshop Workshop

Evaluation of Facilities at SAMSI Workshops for Complex Computer Models Program

100%

80%

60% Excellent Very Good Good 40% Fair Poor Percentage of Responses

20%

0% Opening Modeling Workshop CompMod CompMod Models Biosystems Program CompMod CompMod Workshop CompMod Mid-Program Methodology SAMSI/MUCM Engineering & Engineering Terrestrial Mid- Terrestrial SAMSI/NPCDS CompMod Joint Summer School Summer

495 Evaluation of Facilities at SAMSI Workshops for Random Matrices and Multiplicity and Reproducibility Programs

100%

90%

80%

70%

60% Excellent Very Good 50% Good Fair 40% Poor

Percentage of Responses 30%

20%

10%

0% RM Matrices Random Matrices RM Large RM Geometry MultRep Opening Bayesian Focus Graphical Models workshop Workshop Workshop Workshop Workshop

Evaluation of Facilities at SAMSI Education & Outreach Workshops: 2006-07

100%

80%

60% Excellent Very Good Good Fair 40% Poor Percentage of Responses

20%

0% Ungergrad CAARMS Undergrad Two- Undergrad Two- IMSM for Graduate Interdisciplinary Workshop Day Fall Workshop Day Spring Students Workshop Workshop Workshop

496 Evaluation of Lodging at SAMSI Workshops for Complex Computer Models Program

100%

90%

80%

70%

60% Excellent 50% Very Good Good 40% Fair Poor

Percentage of Responses 30%

20%

10%

0% Opening Modeling Workshop CompMod CompMod Models Biosystems Program CompMod CompMod Workshop CompMod Mid-Program Methodology SAMSI/MUCM Engineering & Engineering Terrestrial Mid- Terrestrial SAMSI/NPCDS CompMod Joint Summer School Summer

Evaluation of Lodging at SAMSI Workshops for Random Matrices and Multiplicity and Reproducibility Programs

100%

90%

80%

70%

60% Excellent Very Good 50% Good Fair 40% Poor

Percentage of Responses 30%

20%

10%

0% RM Matrices Random Matrices RM Large RM Geometry MultRep Opening Bayesian Focus Graphical Models workshop Workshop Workshop Workshop Workshop

497 Evaluation of Lodging at SAMSI Education & Outreach Workshops: 2006-07

100%

80%

60% Excellent Very Good Good Fair 40% Poor Percentage of Responses

20%

0% Ungergrad CAARMS Undergrad Two- Undergrad Two- IMSM for Graduate Interdisciplinary Workshop Day Fall Workshop Day Spring Students Workshop Workshop Workshop

Evaluation of Transportation at SAMSI Workshops for Complex Computer Models Program

100%

80%

60% Excellent Very Good Good 40% Fair Poor Percentage ofResponses

20%

0% Opening Modeling Workshop CompMod CompMod Models Biosystems Program CompMod CompMod Workshop CompMod Mid-Program Methodology SAMSI/MUCM Engineering & Terrestrial Mid- SAMSI/NPCDS CompMod Joint Summer School Summer

498

Evaluation of Transportation at SAMSI Workshops for Random Matrices and Multiplicity and Reproducibility Programs

100%

80%

60% Excellent Very Good Good Fair 40% Poor Percentage of Responses

20%

0% RM Matrices Random Matrices RM Large RM Geometry MultRep Opening Bayesian Focus Graphical Models workshop Workshop Workshop Workshop Workshop

Evaluation of Transportation at SAMSI Education & Outreach Workshops: 2006- 07 100%

80%

60% Excellent Very Good Good Fair 40% Poor Percentage of Responses

20%

0% Ungergrad CAARMS Undergrad Two- Undergrad Two- IMSM for Graduate Interdisciplinary Workshop Day Fall Workshop Day Spring Students Workshop Workshop Workshop

499

SAMSI Evaluation Workshop on Biosystems Modeling March 5-7, 2007

Your feedback on this workshop is requested by SAMSI’s funding agencies, who view it as important for assessing and improving our performance. Your feedback is also gratefully appreciated by SAMSI’s directors, because it will enable us to immediately improve SAMSI activities. Please fill out this form and hand it to a SAMSI Staff Member, or return it by mail.

0. Personal Information: We are required by our funding agencies to obtain information – in a standard format – about all participants in SAMSI activities. If you have not already done so, please go to www.samsi.info/PartInfo/200607/participantinformationform.html to provide this information. Note that if you have participated in a SAMSI activity since last July 1 and completed this webform, you need not do so again, unless your personal information has changed.

1. General Ratings: Poor Fair Good Very Excellent Good . a. Scientific Quality 1 2 3 4 5

b. Staff Helpfulness 1 2 3 4 5

c. Meeting Room/AV Facilities 1 2 3 4 5

d. Lodging 1 2 3 4 5

e. Local Transportation 1 2 3 4 5

2a. What were the positive aspects of the organization and running of this workshop?

______

______

2b. What parts of the organization and running need improvement?

______

______

3. Please comment on the Scientific Quality:

500 a. Innovation: ______

______

b. Communication: ______

______

c. Level: ______

______

4. Additional comments on any other aspects of the workshop

______

______

______

5. An important goal of SAMSI is to create synergies between disciplines. How well did this workshop further this goal?

______

6. How did you learn of this workshop?

______

7. Please suggest ideas / contacts for future SAMSI activities

______

______

______

501