<<

Princeton University Workshop on Frontiers of Statistics

in Honour of Professor Peter Bickel’s 65th Birthday

May 18 - 20, 2006, Princeton, USA Table of Contents

Acknowledgements ...... 1 Background...... 2 BiographyofPeterJ.Bickel...... 3 Committees...... 4 InvitedSpeakers ...... 5 ProgramOverview ...... 6 DirectionsMap ...... 7 Program...... 8 Abstracts ...... 13 WorkshopParticipants ...... 33 Contents of Book “Frontiers of Statistics” ...... 38 SpecialThanks ...... 43 Acknowledgements

Sponsors

We gratefully acknowledge the generous financial support of :

Minerva Research Foundation

Bendheim Center for Finance, Princeton University

National Science Foundation

Department of Operations Research & Financial Engineering,

Princeton University

and academic support of:

Institute of Mathematical Statistics

International Indian Statistical Association

1 Background

The workshop intends to bring top and junior researchers together to define and ex- pand the frontiers of statistics. It provides a focal venue for top and junior researchers to gather, interact and present their new research findings, to discuss and outline emerging problems in their fields, and to lay the groundwork for fruitful future collaborations. A distinguished feature is that all topics are in core statistics with interactions with other dis- ciplines such as biology, medicine, engineering, computer science, economics and finance. Topics include: (1) Nonparametric inference and machine learning; (2) Longitudinal and functional data analysis; (3) Time series, and financial ; (4) Computational biology and biostatistics; (5) MCMC, Bootstrap, and robust statistics; (6) Experimental design and industrial engineering. The workshop also serves advanced graduate students and young researchers looking for new topics to work on and experienced researchers who hope to gain an overview of contemporary developments in statistics. The workshop is held on the occasion of the 65th birthday of Professor PeterJ. Bickel, Professor of Statistics, University of California, Berkeley, one of the most celebrated statis- ticians of our time. A book on the “Frontiers of Statistics” will soon be published based on the topics presented on the workshop. The book will map the frontiers of the various disciplines in statistics and provide useful references on the latest developments in each subject. It will also be helpful to both new and experienced researchers who are willing to gain a bird’s-eye view of the various frontiers of statistics, and published in celebration of Professor Peter J. Bickel’s 65th birthday.

2 Biography of Peter J. Bickel

Peter Bickel has been a leading figure in the field of statistics in the 43 years since he received his Ph.D. in Statistics at the age of 22. He is widely recognized as one of the greatest statisti- cians of our time in any metrics: breadth, depth and productivity. He has made wide-ranging and far-reaching contributions to the discipline of statistics. He has pioneered the research in many statistical disciplines and has made fundamental contributions to many areas in statistics. These include robust statistics, decision theory, semiparametric modeling, bootstrap, nonparametric mod- eling, machine learning, computational biology, and many other areas (e.g. transportation and genomics) where statistics and Peter J. Bickel quantitative approaches play an important role. His exceptional record of research accomplishment is evidenced by his exceptionally many publications in the very top ranking journals in the field of statistics. His scientific findings have strongly reshaped statistical thinking, methodological development, theoretical studies, and data analysis. His research has strongly influenced the development of other quantitative dis- ciplines such as engineering, economics, finance, computational biology, public health, among others. Bickel’s wide-ranging and far-reaching contributions to statistics have been signifi- cantly recognized internationally by numerous awards and honors. These includes the first recipient of The COPSS Presidents Award in 1980, and The Wald Lecturer in 1980. His work has also been greatly recognized outside the statistical profession. These include his John D. and Catherine T. MacArthur Foundation Fellowship in 1984, Guggenheim, NATO, Miller Fellowships, and his election to the American Academy for Arts and Sciences in 1985, the National Academy of Sciences in 1985, Royal Netherlands Academy of Arts and Sciences in 1995. He was also honored the (UC-Berkeley) Chancellor’s distinguished professor (1996-1999). Professor Bickel is a strong professional leader. He has provided strong leadership at all levels, from his enthusiastic administrative services to Berkeley as the department chairman (76–79, 93–98), director of statistical laboratory (87-92), to a dean (twice) of the Physical Sciences and many other important committees; from professional services such as the President of The Institute of Mathematical Statistics (1980–1982), the president of The Bernoulli Society (1991–1993), and the Board of Trustee of National Institute of Statistics (1991 — ) to the national level such as various leading positions in the National Academy of Sciences, National Research Council, Council of Scientific Advisors and the American Association for the Advancement of Science.

3 Scientific Committee:

Jianqing Fan (Chair) Princeton University Luisa Fernholz Princeton University Hira Koul Michigan State university Hans M¨ueller University of California at Davis Vijay Nair University of Michigan Ya’acov Ritov Hebrew University of Jerusalem Jeff Wu Georgia Institute of Technology

Organizing Committee:

Jianqing Fan(Chair) Princeton University Luisa T Fernholz Temple University Heng Peng Princeton University Chongqi Zhang Guangzhou University Yazhen Wang University of Connecticut

Committee on Travel Support:

Luisa T. Fernholz (Chair) Princeton University Jianqing Fan Princeton University Liza Levina University of Michigan Yijun Zuo Michigan State University Yazhen Wang University of Connecticut

4 Invited speakers:

Yacine Ait-Sahalia Princeton University Donald Andrews Peter B¨uhlmann Swiss Federal Institute of Technology Zurich Kjell Doksum University of California, Berkeley David Donoho Stanford University Ursula Gather University of Dortmund Jayanta K. Ghosh Purdue University Friedrich Goetze University of Bielefeld Peter G. Hall The Australian National University Haiyan Huang University of California at Berkeley Jiming Jiang University of California, Davis Hira Koul Michigan State University Soumendra N. Lahiri Iowa State University Elizaveta Levina University of Michigan Jun Liu Harvard University Regina Liu Rutgers University Xiaoli Meng Harvard University Stephan Morgenthaler EPFL Learning Center Hans M¨uller University of California, Davis Vijay Nair The University of Michigan Byeong Park Seoul National University Nancy Reid University of Toronto John Rice University of California, Berkeley Yaacov Ritov Israel Social Sciences Data Center Anton Schick Binghamton University Chris Sims Princeton University David Tyler Rutgers University Sara van der Geer Swiss Federal Institute of Technology Zurich Mark van der Laan University of California, Berkeley Willem van Zwet University of Leiden Jane-Ling Wang University of California, Davis Jon Wellner University of Washington Yazhen Wang University of Connecticut Jeff C. Wu Georgia Institute of Technology Zhiliang Ying Columbia University Chunming Zhang University of Wisconsin at Madison Yijun Zuo Michigan State University

5 Program Overview

Thursday Friday Saturday

8:30-8:45 Registration

8:45-9:00 Opening Ceremony

9:00-9:30 Peter G. Hall Jun Liu Hira Koul John Rice

9:30-10:00 Peter B¨uhlmann Haiyan Huang Anton Sckick Jon Wellner

10:00-10:30 Sara van der Geer Zhiliang Ying Soumendra N. Lahiri Ureula Gather

10:30-11:00 Photo and Break Break Break 11:00-11:30 Willem Van Zwet Hans M¨uller Reginia Liu Jayanta K. Gosh

11:30-12:00 Nancy Reid Chunming Zhang Yijun Zuo Xiaoli Meng

12:00-12:30 Friedrich Goetze Byeong Park Jiming Jiang Jeff Wu

12:30-14:00 Lunch Lunch Lunch 14:00-14:30 Kjell Doksum David Donoho

14:30-15:00 Jane-Ling Wang David Tyler

15:00-15:30 Stephan Morgenthaler Yaacov Ritov

15:30-16:00 Break Break 16:00-16:30 Vijay Nair Chris Sims

16:30-17:00 Elizaveta Levina Yacine Ait-Sahalia

17:00-17:30 Mark van der Laan Donald Andrews

6 7 Program

May 17, 2006 (Wednesday)

19:30-21:30 Reception Palmer House (http://www.princeton.edu/palmerhouse/) Tel: 609-258-3715 Fax: 609-258-0526

May 18, 2006 (Thursday)

8:00-8:45 Registration F101∗ 8:45-9:00 Opening Ceremony F101 Chair: Jianqing Fan Invited Session 9:00-10:30 Chair: Don Fraser F101 9:00 Peter G. Hall Some theory for classifiers in high-dimensional, low sample size settings 9:30 Peter B¨uhlmann Very high-dimensional data: prediction and variable selection 10:00 Sara van der Geer Oracle inequalities for the LASSO 10:30-11:00 Photo and Break 11:00-12:30 Chair: Ursula Gather F101 11:00 Willem van Zwet An expansion for a discrete non-lattice distribution 11:30 Nancy Reid Applied Asymptotics 12:00 Friedrich Goetze Edgeworth Approximations for Symmetric Statistics

8 12:30-14:00 Lunch (Friend Convocation Room) 14:00-15:30 Chair: Luisa Fernholz F101 14:00 Kjell Doksum Powerful Choices: Variable and Tuning Constant Selection in Nonparametric Regression based on Power 14:30 Jane-Ling Wang Flexible Approaches to Model Survival and Longitudinal Data Jointly 15:00 Stephan Morgenthaler Smoothing Large Tables 15:30-16:00 Break 16:00-17:30 Chair: David Blei F101 16:00 Vijay Nair Statistical Inverse Problems in Active Network Tomography 16:30 Elizaveta Levina Detection in Wireless Sensor Networks 17:00 Mark van der Laan Estimating function based cross-validation End of day 1

∗Friend 101

9 Program

May 19, 2006 (Friday)

8:45-9:00 Registration F006∗ Parallel Invited Sessions 9:00-10:30 Chair: Julian Faraway F006 9:00 Jun Liu Bayesian Methods in Haplotype Inference and Disease Mapping 9:30 Haiyan Huang A statistical framework to infer functional gene associations from multiple biologically dependent microarray experiments 10:00 Zhiliang Ying Semiparametric mixed effects models for duration and longitudinal data

9:00-10:30 Chair: Run-ze Li F004∗∗ 9:00 Hira Koul Goodness-of-fit testing in interval censoring case 1 9:30 Anton Sckick Efficient estimators for times series 10:00 Soumendra N. Lahiri Edgeworth expansions for sums of block-variables under weak dependence 10:30-11:00 Break 11:00-12:30 Chair: Richard Samworth F006 11:00 Hans M¨uller Functional Variance 11:30 Chunming Zhang Spatially Adaptive Functional Linear Regression with Functional Smooth Lasso 12:00 Byeong Park Estimation and Testing for Varying

10 Coefficients in Additive Models with Marginal Integration

11:00-12:30 Chair: Miriam Donoho F004 11:00 Reginia Liu Mining Massive Text Data: Classification, Construction of Tracking Statistics and Inference under Misclassification 11:30 Yijun Zuo Multi-Dimensional Trimming Based on Data Depth 12:00 Jiming Jiang Fence Methods: Another Look at Model Selection 12:30-14:00 Lunch(Friend Convocation Room) Invited Session 14:00-15:30 Chair: Stephan Morgenthaler FCR∗∗∗ 14:00 David Donoho Sparsity in Inference: past trends, future promise 14:30 David Tyler Invariant coordinate selection (ICS): A robust statistical perspective on independent component analysis (ICA) 15:00 Yaacov Ritov Some remarks on non-linear dimension reduction 15:30-16:00 Break 16:00-17:30 Chair: Yazhen Wang FCR 16:00 Chris Sims Bayesian Inference in Central Banks: Recent Developments in Monetary Policy Modeling 16:30 Yacine Ait-Sahalia Likelihood Inference for Diffusions 17:00 Donald Andrews The Limit of Finite Sample Size and a Problem with Subsampling End of day 2

∗Friend 006 ∗∗Friend 004 ∗∗∗Friend Convocation Room

11 Program

May 20, 2006 (Saturday)

8:45-9:00 Registration FCR Invited Sessions 9:00-10:30 Chair: Anirban Dasgupta FCR 9:00 John Rice Multiple Testing in Astronomy 9:30 Jon Wellner Goodness of fit via phi-divergences: a new family of test statistics 10:00 Ursula Gather Methods of robust online signal extraction and applications 10:30-11:00 Break 11:00-12:30 Chair: Zhezhen Jin FCR 11:00 Jayanta K. Gosh Convergence and Consistency of Newton’s Algorithm for Estimating a Mixing Distribution 11:30 Xiaoli Meng Statistical physics and statistical computing: A critical link– estimating criticality via perfect sampling 12:00 Jeff Wu Bayesian Hierarchical Modeling for Integrating Low-accuracy and High-accuracy Experiments 12:30-14:00 Lunch(Friend Convocation Room) End of day 3

12 Abstracts

13 Likelihood Inference for Diffusions Yacine Ait-Sahalia

Bendheim Center for Finance Princeton University, Princeton University

This talk surveys recent results on closed form likelihood expansions for discretely sampled diffusions. One major impediment to both theoretical modeling and empirical work with continuous-time models is the fact that in most cases little can be said about the implications of the instantaneous dynamics for longer time intervals. One cannot in general characterize in closed form an object as simple, yet fundamental for everything from prediction to estimation and derivative pricing, as the conditional density of the process, also known as the transition function of the process. I will describe a method which produces accurate approximations in closed form to the transition function of an arbitrary multivariate diffusion. I will then show a connection between this method and saddlepoint approximations and provide examples. Next, I will discuss inference using this method when the state vector is only partially observed, as in stochastic volatility or term structure models. Finally, I will outline the use of this method in specification testing and sketch derivative pricing applications.

The Limit of Finite Sample Size and a Problem with Subsampling Donald W.K. Andrews

Department Economics, Yale University

This paper considers tests and confidence intervals based on a test statistic that has a limit distribution that is discontinuous in a nuisance parameter or the parameter of interest. The paper shows that standard fixed critical value (FCV) tests and subsample tests often have asymptotic size—defined as the limit of the finite sample size—that is greater than the nominal level of the test. We determine precisely the asymptotic size of such tests under a general set of high-level conditions that are relatively easy to verify. Often the asymptotic size is determined by a sequence of parameter values that approach the point of discontinuity of the asymptotic distribution. The problem is not a small sample problem. For every sample size, there can be parameter values for which the test over-rejects the null hypothesis. Analogous results hold for confidence intervals. We introduce a hybrid subsample/FCV test that alleviates the problem of over-rejection asymptotically and in some cases eliminates it. In addition, we introduce size-corrections to the FCV, subsample, and hybrid tests that eliminate over-rejection asymptotically. In some examples, these size corrections are computationally challenging or intractable. In other examples, they are feasible. This is joint work with Patrik Guggenberger.

14 Very High-dimensional Data: Prediction and Variable Selection Peter B¨uhlmann Swiss Federal Institute of Technology Zurich

We consider problems where the number of predictor variables p is much larger than sample size n, i.e. function, the Lasso or also boosting algorithms have been shown to be asymptotically consistent and both of them often exhibit very good empirical perfor- mance. However, the problem of variable selection is much more subtle and difficult than prediction. We will discuss theoretical and practical potential and limitations of the Lasso and boosting for variable selection, and we will present powerful improvements. The talk is a special birthday tour for Peter Bickel: from ”Relaxed Lasso” over ”Sparse Boosting” to completely different ideas from the ”PC algorithm” in graphical modeling. The methods are used for two problems in computational biology: (i) alternative splicing using single- gene libraries; and (ii) short motif modeling for splice site detection.

Powerful Choices: Variable and Tuning Constant Selection in Nonparametric Regression based on Power Kjell Doksum Department of Statistics, University of California, Berkeley

This paper considers nonparametric multiple regression procedures for analyzing the relationship between a response variable and a vector of covariates. It uses an approach which handles the dilemma that with high dimensional data the sparsity of data in re- gions of the sample space makes estimation of nonparametric curves and surfaces virtually impossible. This is accomplished by abandoning the goal of trying to estimate true under- lying curves and instead estimating measures of dependence that can determine important relationships between variables. These dependence measures are based on local parametric fits on subsets of the covariate space that vary in both dimension and size within each dimension. The subset which maximizes a signal to noise ratio is chosen. The signal is a local estimate of a dependence parameter which depends on the subset size, and the noise is an estimate of the standard error (SE) of the estimated signal. This approach of choos- ing the window size to maximize a signal to noise ratio lifts the curse of dimensionality because for regions with sparsity of data the SE is very large. For contigious Pitman al- ternatives it corresponds to asymptotically maximizing the probability of correctly finding relationships between covariates and a response, that is, maximizing asymptotic power. It is shown that within a selected dimension, the bandwidths of the optimally selected subset

15 do not tend to zero as the sample size n grows except for alternatives where the length of the intervals where the alternative differs from the hypothesis tends to zero as n grows. One of the dimension reduction algorithms is used together with MARS and GUIDE and is shown to improve their performance. This is joint work with Chad Schafer, Shijie Tang and Kam Tsui.

Sparsity in Inference: Past Trends, Future Promise David Donoho Statistics Department, Stanford University

Suppose we have to estimate a large number of parameters, most of which are zero or negligible and some of which are important or significant; but we don’t know in advance which parameters are likely to be negligible and which’are likely to be important. This important problem in some sense spans large swaths of applied statistics, from regression model building to gene association studies. I’ll discuss some of Peter Bickel’s early work related to this problem, and how the problem has grown and mutated over the years. At this point, it’s a problem with truly vast implications, having applications throughout science and technology, with lots of challenging mathematics and surprising applications.

Methods of Robust Online Signal Extraction and Applications Ursula Gather Department of Statistics, University of Dortmund

We discuss filtering procedures for robust extraction of a signal from noisy time series. These methods can e.g. be applied to online observations of vital parameters which are acquired by clinical information systems for critically ill patients. Multivariate time series from online monitoring exhibit trends, abrupt level changes and large spikes (outliers) as well as periods of relative stability. Also, the measurements are overlaid with a high level of noise and among the variables strong dynamic dependencies are found (Gather et al. (2002)). The challenge is to develop methods that allow a fast and reliable denoising of these time series. Noise and artifacts are to be separated from structural patterns of relevance. Standard approaches to univariate signal extraction are moving averages and (univari- ate) running medians, but they have shortcomings when outliers or trends occur. Review- ing and extending recent work we present new methods for robust online signal extraction and discuss their merits for preserving trends, abrupt shifts and extremes and for the removal of spikes (Davies, Fried, Gather (2004)). Our robust regression moving window

16 methods are applicable even in real time because of increased computational power and fast algorithms (Bernholt and Fried (2003)). In multivariate robust signal extraction efficiency is lost if the error terms of the vari- ables are highly correlated since generalizing robust univariate regression methods does not result in affine equivariant procedures. Multivariate affine equivariant regression methods with high breakdown, as e. g. MCD-regression (Rousseeuw et al. (2004)), more over assume that the data are in general position. For discrete data in short time windows this is however often not the case. We therefore propose new procedures for multivariate signal extraction, which offer fast and robust signal extraction, good efficiency properties and which can be used for discretely measured data with low variability as well as in situations with many outliers.

Convergence and Consistency of Newton’s Algorithm for Estimating a Mixing Distribution Jayanta K. Ghosh Department of Statistics, Purdue University

In recent years Michael Newton has proposed an algorithmic estimate of a mixing distribution, which is computationally efficient. We prove its convergence and consistency under rather strong conditions. The consistency result is new. A proof of convergence given earlier under same conditions by Newton is shown to be incomplete and not easily rectifiable. We study various other aspects of the estimate and compare it with the Bayes estimate based on Dirichlet mixtures. This is joint work with Surya Tokdar.

Edgeworth Approximations for Symmetric Statistics Friedrich Goetze Department of Mathematics, University of Bielefeld

We shall describe conditions, such that Edgeworth approximations up to an error o(N −1) hold for a general class of asymptotical linear symmetric statistics in N indepen- dent observations, which admits a regular stochastic Hoeffding expansion. The conditions involve Cramer’s condition of smoothness for the linear term and some covariance type conditions for the second order term. The results are joint work with M. Bloznelis and extend previous work by P. Bickel, V. Bentkus, W. van Zwet and the author. They are based on new analytical and combinatorial techniques. Connections with approximation results in Probability and Number Theory for related degenerate U-statistics, and their dimension dependence will be discussed as well.

17 Some Theory for Classifiers in High-dimensional, Low Sample Size Settings Peter Hall Centre for Mathematics and its Applications, Mathematical Sciences Institute, Australian National University

A large class of distance-based classifiers is defined, and their performance addressed using theoretical arguments based on letting dimension diverge as sample size is kept fixed. Particular attention is paid to the use of truncation, to heighten sensitivity of the classifiers in cases of data sparsity. It is shown that in that setting, truncated distance- based classifiers can perform well when differences between distributions are detectable but not estimable. They do not do quite as well as classifiers based on Donoho and Jin’s higher-criticism methods, although they are more robust against assumptions about distribution type and component relationships. However, the robustness of higher criticism can be increased by using methods based thresholding, as well as empirical approaches.

A Statistical Framework to Infer Functional Gene Associations from Multiple Biologically Dependent Microarray Experiments Haiyan Huang Department of Statistics, University of California, Berkeley

Microarray data from an increasing number of biologically interrelated and interde- pendent experiments now allow more complete portrayals of functional gene relationships involved in biological processes. However, in the current integrative analyses of microarray data, an important practical issue is widely ignored: the existence of dependencies among gene expressions across biologically related experiments. When not accounted for, these dependencies (due to either similar intrinsic conditions or relevant external perturbations among the experiments) can result in inaccurate inferences of functional gene associa- tions, and hence incorrect biological conclusions. To address this fundamental problem, we propose a new measure, Knorm correlation, to quantify functional gene associations in the presence of such experimental dependencies. Our intuitive strategy is to reduce the experimental dependencies before estimating gene correlations. The statistical model underlying Knorm correlation is a multivariate normal distribution characterized by a Kronecker product dependency structure. This unique structure maintains the same ex- perimental correlations across genes and the same gene correlations across experiments. The proposed measure simplifies to the Pearson coefficient when experiments are uncor- related. Applications to simulation studies and to two real datasets (on yeast and human

18 genes) demonstrate the success of Knorm correlation, and also the adverse impact of exper- imental dependencies on gene associations using Pearson coefficients. Knorm correlation is expected to greatly improve the accuracy of biological inferences made from experiments currently (and incorrectly) assumed to be uncorrelated. This is a joint work with Melinda Teng and Xianghong Zhou.

Fence Methods: Another Look at Model Selection Jiming Jiang Department of Statistics, University of California, Davis

Many model search strategies involve trading off model fit with model complexity in a penalized goodness of fit measure. Asymptotic properties for these types of procedures in settings like linear regression and ARMA time series have been studied. Yet, such strate- gies do not always translate into good finite sample performance. The issue is typically one of the procedure being overly sensitive to the setting of penalty parameters, which are required to be increasing functions of sample size. Furthermore, these strategies do not generalize naturally to more complex models, such as those for modeling clustered data or those that involve adaptive estimation. In these cases, penalties and model complexity may not be naturally defined. We introduce a new class of model selection strategies known as fence methods. The general idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model will be selected among the correct models (those within the fence) according to simplicity of the models. We describe a variety of fence methods, based on the same principle but applied to different situations. These include regression, least angle regression, linear mixed models for clustered and non-clustered data, generalized linear mixed models for clustered and non-clustered data, and time series models. We show the broad applicability of fence methods to all of these areas by giving a number of examples, each supported by simulation results or real-life data analyses. In terms of theoretical development, we give sufficient conditions for consistency of fence, a desirable property for a good model selection procedure. This work is joint with J. Sunil Rao, Zhonghua Gu and Thuan Nguyen.

19 Goodness-of-fit Testing in Interval Censoring Case 1 Hira L. Koul Department of Statistics and Probability, Michigan State University

In the interval censoring case 1, an event occurrence time is unobservable, but one ob- serves an inspection time and whether the event has occurred prior to this time or not. The focus here is to provide tests of goodness-of-fit hypothesis pertaining to the distribution of the event occurrence time. The proposed tests are based on certain marked empirical processes for testing a simple hypothesis and their martingale transforms. These tests are asymptotically distribution-free, consistent against a large class of fixed alternatives and have nontrivial asymptotic power against a large class of local alternatives.

Edgeworth Expansions for Sums of Block-variables under Weak Dependence Soumendra N. Lahiri Department of Statistics, Iowa State University

∞ Let {Xi}i=−∞ be a sequence of random vectors and let Yin = fin(Xi,l) be zero mean block-variableswhere Xi,l = (Xi,...,Xi+l−1), i ≥ 1 are overlapping blocks of length ℓ and where fin are Borel measurable functions. This paper establishes valid joint asymptotic n n expansions of general orders for the joint distribution of the sums i=1 Xi and i=1 Yin ∞ under weak dependence conditions on the sequence {Xi}i=−∞ whenP the blockP length ℓ grows to infinity. In contrast to the classical Edgeworth expansion results where the terms in the expansions are given by powers of n−1/2, the expansions derived here are mixtures −1/2 n −1/2 of two series, one in powers of n while the other in powers of [ l ] . Applications of the expansions to studentized statistics and to block bootstrap methods for time series data are given.

Detection in Wireless Sensor Networks Elizaveta Levina Department of Statistics, The University of Michigan

Wireless sensor networks are becoming more widely available for use in various appli- cations, such as intruder detection and ecological monitoring. The basic issues in sensor networks (detection, estimation, design) are statistical but little work in this area has been

20 done by statisticians. I will give a brief overview of the main problems and then focus on a local-vote decision algorithm we developed for target detection by a wireless sensor network. Sensors acquire measurements corrupted by noise, make individual decisions, correct their decisions after consulting the neighboring sensors, and then a collective deci- sion is made by the network. Related local methods have been proposed by the engineers but no theoretical performance guarantees were available. We give an explicit formula for the decision threshold for a given false alarm rate, based on limit theorems for weakly dependent random fields. We also show that, for a fixed false alarm rate, the local-vote correction significantly improves target detection rate. Joint work with George Michailidis and Natallia Katenka.

Bayesian Methods in Haplotype Inference and Disease Mapping Jun Liu Department of Statistics, Harvard University

Haplotypes provide complete information of inheritance, which are very useful in pop- ulation genetics and association studies. Since experimentally determining haplotype data is expensive, much effort has been devoted to develop computational tools for inferring haplotypes from genotype data. I will present a few Bayesian and semi-Bayesian models that have been formulated over the past few years for this task, including new hierarchical Bayes model developed in our group that incorporates the coalescence effect in a prior distribution. The prediction accuracy of the new method is uniformly improved compared to existing methods such as HAPLOTYER and PHASE. I will further discuss a Bayesian approach in detecting multi-locus interactions (epista- sis) for case-control association studies. Existing methods are either of low power or com- putationally infeasible when facing of a large number of markers. Using MCMC sampling techniques, the method can efficiently detect interactions among thousands of markers. Using simulation results, I will discuss the power of our approach and the importance to consider epistasis in association mapping. Based on joint work with Yu Zhang and Tim Niu.

21 Mining Massive Text Data: Classification, Construction of Tracking Statistics and Inference under Misclassification Regina Liu Department of Statistics, Rutgers University

We present a systematic data mining procedure for exploring large free-style text datasets to discover useful features and develop tracking statistics (often referred to as performance measures or risk indicators). The procedure includes text classification, con- struction of tracking statistics, inference under error measurements and risk analysis. The main difficulty in deriving this inference scheme is the accounting for misclassification errors, for which we propose two types of approaches: “plug-in” and “projection” meth- ods. We also consider the bootstrap calibration for fine tuning. Finally, as an illustrative example, the proposed data mining procedure is applied to analyzing an aviation safety report repository from the FAA to show its utility in aviation risk management or general decision-support systems. Although most illustrations here are drawn from aviation safety data, the proposed data mining procedure applies to many other domains, including, for example, mining free-style medical reports for tracking possible disease outbreaks. This is joint work with Daniel Jeske, Department of Statistics, UC Riverside.

Statistical Physics and Statistical Computing: A Critical Link– Estimating Criticality via Perfect Sampling Xiao-Li Meng Department of Statistics, Harvard University

This talk is based on the following chapter, jointly written with James Servidea of U.S. Department of Defense, in the volume dedicated to Professor Peter Bickel: “The main purpose of this chapter is to demonstrate the fruitfulness of cross-fertilization between statistical physics and statistical computation, by focusing on the celebrated Swendsen- Wang algorithm for the Ising model and its recent perfect sampling implementation by Mark Huber. In particular, by introducing Hellinger derivative as a measure of instanta- neous changes of distributions, we provide probabilistic insight into the algorithm’s critical slowing down at the phase transition point. We show that at or near the phase transition, an infinitesimal change in the temperature parameter of the Ising model causes an as- tronomical shift in the underlying state distribution. This finding suggests an interesting conjecture linking the critical slowing down in coupling time with the grave instability of the system as characterized by the Hellinger derivative (or equivalently, by Fisher infor- mation). It also suggests that we can approximate the critical point of the Ising model, a

22 physics quantity, by monitoring the coupling time of Huber’s bounding chain algorithm, an algorithmic quantity. This finding might provide an alternative way of approximating criticality of thermodynamic systems, which is typically intractable analytically. We also speculate that whether we can turn perfect sampling from a pet pony into a workhorse for general scientific computation may depend critically on how successful we can engage, in its development, researchers from statistical physics and related scientific fields.”

Smoothing Large Tables Stephan Morgenthaler EPFL Learning Center

Methods to smooth large tables are described. Such smoothing problems are of interest in many scientific contexts and with a variety of objectives in mind. One may want to interpolate the table entries, or to quantify the differences between rows and columns, or to classify rows and columns into homogeneous subgroups, or to find the best rows and columns, or some other objective. Fisher’s ANOVA, which can be computed by sweeping row means and column means from the table, assigns a single effect to each row and each column and was originally invented for tables of low dimension. The singular value decomposition of the table offers an alternative single effects approximation. In both cases, the smoothed row traces, that is the plot of the row entries against the row effects, are straight lines. More general table smoothers are obtained by using more flexible traces. Some of the difficulties with this approach are discussed, among them the choice of row and col- umn variables replacing the single effects from above, the parsimonious choice of trace parameters, the classification of traces, and the transformation of table entries.

Functional Variance Hans-Georg M¨uller Department of Statistics, University of California, Davis

Functional data consist of an observed sample of smooth random trajectories. A key tool for the analysis of such data is a representation in terms of eigenfunctions of the autocovariance operator of the underlying stochastic process and the associated functional principal components. In some applications the information of interest resides not in the observed smooth random trajectories themselves but rather in the additive noise. Assuming the noise is composed of a white noise component and a smooth random process component, we refer to the latter as the functional variance process. This process can

23 then be decomposed in terms of its eigenfunctions. Methods to estimate eigenfunctions and functional principal component scores for the functional variance process are based on residuals obtained in an initial smoothing step, applied to the original data. We discuss asymptotic justifications and applications. (joint work with U. Stadtmuller and F. Yao).

Statistical Inverse Problems in Active Network Tomography Vijay Nair Department of Statistics, Department of Industrial & Operations Engineering, University of Michigan, Ann Arbor

The term network tomography, first introduced in Vardi (1996), characterizes two classes of large-scale inverse problems that arise in the modeling and analysis of computer and communications networks. This talk will deal with active network tomography where the goal is to recover link-level quality of service parameters, such as packet loss rates and delay distributions, from end-to-end path-level measurements. Internet service providers use this to characterize network performance and to monitor service quality. We will provide a review of recent developments, including the design of probing experiments, inference for loss rates and delay distributions, and applications to network monitoring. This is joint work with George Michailidis, Earl Lawrence, Bowei Xi, and Xiaodong Yang.

Estimation and Testing for Varying Coefficients in Additive Models with Marginal Integration Byeong Park Department of Statistics, Seoul National University

We propose marginal integration estimation and testing methods for the coefficients of varying coefficient multivariate regression model. Asymptotic distribution theory is devel- oped for the estimation method which enjoys the same rate of convergence as univariate function estimation. For the test statistic, asymptotic normal theory is established. These theoretical results are derived under the fairly general conditions of absolute regularity (β-mixing). Application of the test procedure to the West German real GNP data reveals that a partially linear varying coefficient model is best parsimonious in fitting the data dynamics, a fact that is also confirmed with residual diagnostics.

24 Applied Asymptotics Nancy Reid Department of Statistics, University of Toronto

The theory of higher order asymptotics provides quite accurate approximations for a large number of parametric models. However, the details of the theory are somewhat complicated, and perhaps for that reason the methods are not used as often as they might be. I will outline some ’case studies’ where improved approximation is readily implemented and illustrate the effects on the resulting inference. I will suggest areas where further research is needed.

Multiple Testing in Astronomy John Rice Department of Statistics, University of California, Berkeley

Suppose that a very large number of independent null hypotheses are tested, almost all of which are true. How can the proportion of false null hypotheses be estimated? For motivation, I will discuss the Taiwanese-American Occultation Survey, and will explain how this question arises. I will then present some recent results.

Some Remarks on Non-linear Dimension Reduction Ya’acov Ritov Israel Social Sciences Data Center

We remark on the possibility of a well defined dimension reduction. We consider a model in which the data is distributed on a manifold. We present an algorithm for gener- ating a global map of data to a lower dimensional space, minimizing the local structure of the manifold. We remark on the importance on estimating the manifold structure when the main concern is estimating a regression function.

Efficient Estimators for Times Series Anton Schick Department of Mathematical Sciences, Binghamton University

I illustrate several recent results on efficient estimation for semiparametric time series models with a simple class of models: first-order nonlinear autoregression with indepen- dent innovations. In particular I consider estimation of the autoregression parameter, the innovation distribution, conditional expectations, the stationary distribution, the station- ary density, and higher-order transition densities.

25 Bayesian Inference in Central Banks: Recent Developments in Monetary Policy Modeling Christopher A. Sims Department of Economics, Princeton University

In the 1950’s and 60’s, large-scale econometric models, grounded in an elegant the- ory of inference initiated by Trygve Haavelmo, began to be widely used by policy-making instituions. While the models remained in use, their grounding in a theory of inference almost completely disappeared by 2000. In the last few years, there has been research ac- tivity in many central banks aimed at producing models grounded in a Bayesian approach to inference and using modern computational approaches to posterior simulation. This talk summarizes the history and describes the methods and results driving the current research.

Invariant Coordinate Selection (ICS): A Robust Statistical Perspective on Independent Component Analysis (ICA) David E. Tyler Department of Statistics, Rutgers University

In many disciplines, independent component analysis (ICA) has become a popular method for analyzing multivariate data. Independent component analysis typically as- sumes the observe data Y ∈ℜp is generated by a nonsingular affine transformaton of inde- ′ pendent components, i.e. Y = AZ, where A is a nonsingular matrix and Z = (Z1,...,Zp) consists of independent variables Z1,...,Zp. The objective is to then estimate A and hence recover Z. Approaches for recovering Z have often been successful in exploring multivari- ate data in general, i.e. in cases where the ICA model may not be hold. The purpose of this talk is to provide some understanding as to why independent component analysis may work well as a general multivariate method. In particular, without reference to the ICA model, it can be noted that for some methods the recovered Z can be viewed as affine invariant coordinates. That is, if we transform Y → Y∗ = BY + b for any nonsingular Y, then Z∗ = ∆Z + c, where ∆ is a nonsingular diagonal matrix. In other words, the ∗ standardized versions of the components Zj and Zj are the same. Hence, the terminology invariant coordinate selection (ICS). Consequently, this leads to the development of a wide class of affine equivariant co- ordinatewise methods for multivariate data. Some methods to be discussed are affine equivariant principal components, robust estimates of multivariate location and scatter, affine invariant multivariate nonparametric tests, affine invariant multivariate distribution functions, and affine invariant coordinate plots. The affine equivariant principal compo-

26 nents and the corresponding affine invariant coordinate plots can be regarded in a sense as projection pursuit without the pursuit. Several examples are given to illustrate the utility of the proposed methods.

Oracle Inequalities for the LASSO Sara van de Geer Seminar fuer Statistik, ETH Zuerich

We consider the LASSO penalty for general M-estimators. Examples include logistic regression, quantile regression, log-density estimation, and boosting with for example lo- gistic loss or hinge loss. Let Y be a real-valued (response) variable and X be a (co-)variable with values in some space X . Let

m F ⊂ {fα = αkψk} kX=1 m be a (convex subset of a) linear space of functions on X . Here, {ψk}k=1 is a given system n of base functions. Let γf : R ×X → R be some loss function, and let {(Yi, Xi)}i=1 be i.i.d. copies of (X,Y ). We consider the estimator

1 n fˆ = arg min γf (Yi, Xi)+ λˆIˆ(α) , fα∈F (n =1 ) Xi ˆ m m where I(α) := k=1 τˆk|αk|denotes the weighted ℓ1 norm of the vector α ∈ R with 1 n 2 1/2 random weightsPτ ˆk := ( n i=1 ψk(Xi)) . We study the situation where the number of parameters m is large (possiblyP much larger than the number of observations n). Our purpose is threefold. Firstly, we want to show that for a proper choice of the smoothing parameter λˆ ( possibly depending on {τˆk}), the estimator fˆ satisfies an oracle inequality. Secondly, we want the result to hold without any a priori bounds on the functions in F. Thirdly, we aim at “reasonable” values for the constants involved, as indication that the result is not merely an asymptotic one. In certain settings, the smoothing parameter λˆ can be chosen asymptotically equal to 4 2 log m/n, which is four times as large as in the linear Gaussian case with soft thresholding.p The factor 4 comes from using a symmetrization and a contraction inequality.

27 Estimating Function Based Cross-validation Mark van der Laan Division of Biostatistics, University of California, Berkeley

Suppose that we observe a sample of independent and identically distributed realiza- tions of a random variable. Given a model for the data generating distribution, assume that the parameter of interest can be characterized as the parameter value which makes the population mean of a possibly infinite dimensional estimating function equal to zero. Given a collection of candidate estimators of this parameter, and specification of the vec- tor estimating function, we propose a norm of the cross-validated estimating equation as criteria for selecting among these estimators. For example, if we use the Euclidean norm, then our criteria is defined as the Euclidean norm of the empirical mean over the validation sample of the estimating function at the candidate estimator based on the training sam- ple. We establish a finite sample inequality of this method relative to an oracle selector, and illustrate it with some examples. This finite sample inequality provides us also with asymptotic equivalence of the selector with the oracle selector under general conditions. We also study the performance of this method in the case that the parameter of interest itself is pathwise differentiable (and thus, in principle, root-n estimable).

An Expansion for A Discrete Non-lattice Distribution Willem R. van Zwet Department of Statistics, University of Leiden

Much is known about asymptotic expansions for asymptotically normal distributions if these distributions are either absolutely continuous or pure lattice distributions. In this paper we begin an investigation of the discrete but non-lattice case. We tackle one of the simplest examples imaginable and find that curious phenomena occur. Clearly more work is needed. (Co-author Friedrich G¨otze)

Flexible Approaches to Model Survival and Longitudinal Data Jointly Jimin Ding and Jane-Ling Wang (Speaker) Department of Statistics, University of California at Davis

In clinical studies, longitudinal covariates are often used to monitor the progression of the disease as well as survival time. Relationship between a failure time process and some longitudinal covariates is of key interest and so is the understanding of the pattern

28 of longitudinal process to learn more about health status of patients, or to get some insight into the progression of disease. Joint modeling of the longitudinal and survival data has certain advantages and emerged as an effective way to handle both types of data simultaneously. In this talk, we will explore several intriguing and challenging issues in joint modelling. Typically, a parametric longitudinal model is assumed to facilitate the likelihood ap- proach. However, the choice of a proper parametric model turns out more illusive than standard longitudinal studies where no survival end-point occurs. Furthermore, the com- putational burden due to both Monte Carlo numerical integration and EM (Expected Maximum) algorithm is an important concern in the joint modelling setting. To deal with those challenges, we propose several flexible longitudinal models in the joint modelling setting. Simplicity of the model structure is crucial to have good numerical stability, and we will illustrate this through numerical studies and data analysis.

Goodness of Fit via Phi-divergences: A New family of Test Statistics Jon A. Wellner Department of Statistics, University of Washington

A new family of goodness-of-fit tests based on phi-divergences is introduced and studied. The new family is based on phi-divergences somewhat analogously to the phi- divergence tests for multinomial distributions introduced by Cressie and Read (1984), and is indexed by a real parameter s ∈ R: s = 2 gives the Anderson - Darling test statistic, s = 1 gives the Berk-Jones test statistic, s = 1/2 gives a new (Hellinger - distance type) statistic, s = 0 corresponds to the “reversed Berk-Jones” statistic, and s = −1 gives a “studentized” (or empirically weighted) version of the Anderson - Darling statistic. We also introduce corresponding integral versions of the new statistics. We show that the asymptotic null distribution theory of Jaeschke (1979) and Eicker (1979) for the Anderson-Darling statistic, and of Berk and Jones (1979) applies to the whole family of statistics Sn(s) with s ∈ [−1, 2]. We also provide new finite-sample approximations to the null distributions and show how the new approximations can be used to obtain accurate computation of quantiles. On the side of power behavior, we show that for 0

29 We extend the results of Donoho and Jin (2004) by showing that all our new tests for s ∈ [−1, 2] have the same “optimal detection boundary” for normal shift mixture alternatives as Tukey’s “higher-criticism” statistic and the Berk-Jones statistic.

Heterogeneous Autoregressive Realized Volatility Model Yazhen Wang Department of Statistics, University of Connecticut

Volatilities of asset returns are pivotal for many issues in financial economics. The availability of high frequency intraday data should allow us to estimate volatility more accurately. Realized volatility is often used to estimate integrated volatility. To obtain better volatility estimation and forecast, some autoregressive structure of realized volatility is proposed in the literature. This talk will present my recent work on heterogeneous autoregressive models of realized volatility.

Bayesian Hierarchical Modeling for Integrating Low-accuracy and High-accuracy Experiments Jeff Wu Georgia Institute of Technology, School of Industrial and Systems Engineering

Standard practice in analyzing data from different types of experiments is to treat data from each type separately. By borrowing strength across multiple sources, an integrated analysis can produce better results. Careful adjustments need to be made to incorpo- rate the systematic differences among various experiments. To this end, some Bayesian hierarchical Gaussian process models (BHGP) are proposed. The heterogeneity among different sources is accounted for by performing flexible location and scale adjustments. The approach tends to produce prediction closer to that from the high-accuracy experi- ment. The Bayesian computations are aided by the use of Markov chain Monte Carlo and Sample Average Approximation algorithms. The proposed method is illustrated with two examples: one with detailed and approximate finite elements simulations for mechanical material design and the other with physical and computer experiments. (Based on joint work with Zhiguang Qian).

30 Semiparametric Mixed Effects Models for Duration and Longitudinal Data Zhiliang Ying Department of Statistics, Columbia University

In this talk, I will present a doubly semiparametric mixed effects model for duration and recurrent event time data. This model is useful in accommodating possible informative censoring, a common problem in many follow-up studies. It also exhibits interesting features which make it relatively easy to carry out the usual statistical inferences. We show the usefulness and practicality of the proposed approach via theoretical properties, simulation results and data analysis. Some additional developments on linear mixed effects model for longitudinal data will also be presented.

Spatially Adaptive Functional Linear Regression with Functional Smooth Lasso Chunming Zhang Department of Statistics, University of Wisconsin

In this paper we consider the setting where the regressor is a functional data such as a curve or an image and the response is a scalar. We propose the “functional smooth lasso” (FSL) approach to simultaneously regularize the roughness and the size of the nonzero regions of the functional linear regression estimates. An efficient algorithm is developed for computing FSL. The degrees of freedom of FSL is derived and incorporated into the automatic tuning of regularization parameters. Furthermore, we prove the consistency and the convergence rate of FSL. An interesting finding is that the convergence rate depends on the degree of the ”smoothness” of the predictors. The proposed method is illustrated via simulation studies and real data application.

Multi-Dimensional Trimming Based on Data Depth Yijun Zuo Department of Statistics and Probability, Michigan State University, East Lansing

With a natural order principle, trimming in one dimension is straightforward. One- dimensional trimmed means are among the most popular estimators of the center of data and have been used in various fields of statistics and in our daily life. Trimmed means can overcome the high sensitivity of the mean to outliers (or heavy-tailed data) and the low efficiency of the median for light-tailed data. Hence they can serve as compromises between the mean and the median.

31 Multi-dimensional data often contain outliers, which typically are far more difficult to detect than in one dimension. A robust procedure such as the multi-dimensional trimming that can automatically detecting outliers or “heavy tails” is thus desirable. The task of trimming, however, becomes non-trivial, for there is no natural order principle in high dimensions. In this talk, multi-dimensional trimming based on “data depth” is discussed. It is found that depth-trimmed means can possess very desirable properties such as high efficiency and high robustness. Furthermore, inference procedures based on the depth- trimmed means can outperform the classical Hotelling’s T 2 (and the univariate t) ones. Applications of data depth trimming such as clustering and dimension reduction are also addressed. Contributions of Professor Bickel to trimming are discussed.

32 Workshop Participants

Name Institution Email Address Yacine Ait-Sahalia Princeton University [email protected] Beth Andrews Northwestern University [email protected] Donald W.K. Andrews Yale University [email protected] Alex Bajamonde Genentech Inc. [email protected] Peter J. Bickel Univ. of California, Berkeley [email protected] Steinar Bjerve University of Oslo [email protected] David Blei Princeton University [email protected] Howard Bondell North Carolina State Univ. [email protected] Peter B¨uhlmann Swiss Federal Institute of Tech. Zurich [email protected] Christopher Calderon Princeton University [email protected] Melissa Carroll Princeton University [email protected] Serena Chan Cornell University [email protected] Scott Chasalow Bristol-Myers Squibb [email protected] Aiyou Chen Bell Labs, Lucent Tech. [email protected] Ming-Yen Cheng National Taiwan University [email protected] Shojaeddin Chenouri University of Waterloo [email protected] Laura Chioda Princeton University [email protected] Gregory Chow Princeton University [email protected] Erhan Cinlar Princeton University [email protected] Anirban Dasgupta Purdue University [email protected] Savas Dayanik Princeton University [email protected] Aurore Delaigle Univ. of California, San Diego [email protected] Jimin Ding Univ. of California, Davis jmding@wald. ucdavis.edu Kjell Doksum Univ. of California, Berkeley [email protected] David Donoho Stanford University [email protected] Miriam G. Donoho San Jose State Univ. [email protected] Juan Du Michigan State University [email protected] Veronica Esaulova Otto von Guericke Univ. [email protected] Yingying Fan Princeton University [email protected] Julian Faraway University of Michigan [email protected] Luisa T. Fernholz Princeton University [email protected] Don Fraser University of Toronto [email protected] Mendel Fygenson Univ. of Southern California [email protected]

33 Workshop Participants

Name Institution Email Address Anne Gadermann Univ. of [email protected] Ursula Gather University of Dortmund [email protected] Zhiyu Ge Merrill Lynch gary−[email protected] Jayanta K. Ghosh Purdue University [email protected] Sujit Ghosh North Carolina State Univ. [email protected] Subhashis Ghoshal North Carolina State Univ. [email protected] Friedrich Goetze University of Bielefeld [email protected] Wenceslao G. Manteiga Univ. of Santiago de Compostela [email protected] Jiezhun Gu North Carolina State Univ. [email protected] Arjun Gupta Bowling Green State Univ. [email protected] Peter G. Hall The Australian National Univ. [email protected] Hillary Han Cornell University hillary−[email protected] Jaroslaw Harezlak Harvard University [email protected] Nick Hengartner Los Alamos National Laboratory [email protected] Moonseong Heo Cornell University [email protected] David Hitchcock Univ. of South Carolina [email protected] Haiyan Huang Univ. of California at Berkeley [email protected] Li-Shan Huang University of Rochester [email protected] Tao Huang Yale University [email protected] Ben Huang Bristol-Myers Squibb [email protected] Xiaoming Huo Univ. of California at Riverside [email protected] Ed Ionides University of Michigan [email protected] Barry James Univ. of Minnesota, Duluth [email protected] Kang James Univ. of Minnesota Duluth [email protected] Yuan Ji The University of Texas [email protected] Jiancheng Jiang Princeton University [email protected] Jiming Jiang University of California, Davis [email protected] Kun Jin FDA/CDER/OB/DBI [email protected] Zhezhen Jin Columbia University [email protected] Rebecha Jornsten Rutgers University [email protected] Noureddine El Karoui UC Berkeley [email protected] Katerina Kechris Univ. of Colorado Health Sci. Center [email protected] Abbas Khalili University of Waterloo [email protected] Chris Klaassen Universiteit van Amsterdam [email protected] Hira Koul Michigan State University [email protected]

34 Workshop Participants

Name Institution Email Address Sanjeev Kulkarni Princeton University [email protected] Jaimyoung Kwon Cal State East Bay [email protected] Soumendra N. Lahiri Iowa State University [email protected] Clifford Lam Princeton University [email protected] Hyunsook Lee Pennsylvania State Univ. [email protected] Elizaveta Levina University of Michigan [email protected] Michael Levine Purdue University [email protected] Hongzhe Li University of Pennsylvania [email protected] Lexin Li North Carolina State Univ. [email protected] Runze Li Pennsylvania State Univ. [email protected] Chaobin Liu Bowie State University [email protected] Jun Liu Harvard University [email protected] Mengling Liu New York University [email protected] Regina Liu Rutgers University [email protected] Yanning Liu Cornell University [email protected] Yufeng Liu Univ. of North Carolina yfl[email protected] Markus Loecher Rutgers University [email protected] Adriana Lopes University of Pittsburgh [email protected] Panos Lorentziadis Hellenic American Univ. [email protected] Aurelie Lozano Princeton University [email protected] Ying Lu Harvard University [email protected] Jun Luo Michigan State University [email protected] Jinchi Lv Princeton University [email protected] Loriano Mancini University of Zrich [email protected] David Masson University of Delaware [email protected] Jon McAuliffe University of Pennsylvania [email protected] Anjana Meel University of Pennsylvania [email protected] Xiaoli Meng Harvard University [email protected] Oksana Mokliatchouk Bristol-Myers Squibb [email protected] Stephan Morgenthaler EPFL Learning Center stephan.morgenthaler@epfl.ch Akira Morita Georgia Tech [email protected] Hans M¨ueller Univ. of California Davis [email protected] Yolanda Munoz University of Texas [email protected] Vijay Nair University of Michigan [email protected] Jan Neumann Simens Corporate Research [email protected] Yue Niu Princeton University [email protected]

35 Workshop Participants

Name Institution Email Address Juan Carlos Pardo University of Vigo [email protected] Byeong Park Seoul National University [email protected] Emanuel Parzen Texas A&M University [email protected] Heng Peng Princeton University [email protected] Jianan Peng Acadia University [email protected] Quang Pham University of Alaska Fairbanks [email protected] Nancy Reid University of Toronto [email protected] Philip Reiss Columbia University [email protected] John Rice Univ. of California, Berkeley [email protected] Yaacov Ritov Israel Social Sci. Data Center [email protected] Alex Rojas Carnegie Mellon University [email protected] Juan Romo Universidad Carlos III de Madrid [email protected] Kaisiromwe Sam Uganda Bureau of Statistics [email protected] Alexander Samarov MIT [email protected] Richard Samworth University of Cambridge [email protected] Stanley Sawyer Washington University [email protected] Robert Schapire Princeton University [email protected] Anton Schick Binghamton University [email protected] Damla Senturk Penn State Univ. [email protected] Chris Sims Princeton University [email protected] Dan Spitzner Virginia Tech [email protected] Curtis Storlie North Carolina State Univ. [email protected] Umar Syed Princeton University [email protected] Nian-Sheng Tang Columbia University [email protected] Shijie Tang Univ. of Wisconsin at Madison [email protected] Tiejun Tong Yale University [email protected] David Tyler Rutgers University [email protected] Sara van der Geer Swiss Federal Institute of Tech. Zurich [email protected] Mark van der Laan Univ. of California, Berkeley [email protected] Willem van Zwet University of Leiden [email protected] Bob Vanderbei Princeton University [email protected] Aldo Jose Viollaz Univ. Nac. De Tucuman [email protected] Haiyan Wang Kansas State University [email protected] Jane-Ling Wang Univ. of California at Davis [email protected] Naisyin Wang Texas A&M University [email protected] Paul C. Wang CPR & CDR Technologies, Inc [email protected]

36 Workshop Participants

Name Institution Email Address Qing Wang Princeton University [email protected] Xiaohui Wang University of Virginia [email protected] Yonghua Wang Bristol-Myers Squibb [email protected] Jon Wellner University of Washington [email protected] Roy Welsch MIT [email protected] Yazhen Wang University of Connecticut [email protected] Baolin Wu University of Minnesota [email protected] Jeff C. Wu Georgia Institute of Tech. jeff[email protected] Qiang Wu University Pittsburgh [email protected] Yichao Wu University of North Carolina [email protected] Joseph A. Yahav The Hebrew Univ. of Jerusalem [email protected] Zhiliang Ying Columbia University [email protected] Angela Yu Princeton University [email protected] Peng Zeng Auburn University [email protected] Chongqi Zhang Guangzhou University [email protected] Chunming Zhang Univ. of Wisconsin at Madison [email protected] Hao Zhang North Carolina State Univ. [email protected] Heping Zhang Yale University [email protected] Jingjin Zhang Princeton University [email protected] Jin-Ting Zhang National Univ. of Singapore [email protected] Zhengjun Zhang Univ. of Wisconsin at Madison [email protected] Zhigang Zhang Oklahoma State Univ. [email protected] Tian Zheng Columbia University [email protected] Jianhui Zhou University of Virginia [email protected] Hongtu Zhu Columbia University [email protected] Ji Zhu University of Michigan [email protected] Hui Zou University of Minnesota [email protected] Yijun Zuo Michigan State University [email protected]

37 Frontiers of Statistics

— in honor of Professor Peter J. Bickel’s 65th Birthday Edited by Jianqing Fan and Hira L. Koul Imperial College Press

Table of Contents

1. Our Steps on the Bickel Way Kjell Doksum and Ya’acov Ritov 1 1.1 Introduction...... 1 1.2 Doing Well at a Point and Beyond ...... 2 1.3 Robustness, Transformations, Oracle-free Inference, and Stable Parameters ...... 4 1.4 Distribution Free Tests, Higher Order Expansions, and Challenging Projects ...... 4 1.5 From Adaptive Estimation to Semiparametric Models ...... 5 1.6 HiddenMarkovModels...... 6 1.7 Non-andSemi-parametricTesting ...... 7 1.8 TheRoadtoRealLife ...... 8 References...... 8 Bickel’sPublication ...... 11

Part I. Semiparametric Modeling

2. Semiparametric Models: A Review of Progress since BKRW (1993) Jon A. Wellner, Chris A. J. Klaassen and Ya’acov Ritov 25 2.1 Introduction...... 25 2.2 MissingDataModels ...... 28 2.3 Testing and Profile Likelihood Theory ...... 28 2.4 Semiparametric Mixture Model Theory ...... 29 2.5 Rates of Convergence via Empirical Process Methods ...... 30 2.6 BayesMethodsandTheory ...... 30 2.7 ModelSelectionMethods...... 31 2.8 EmpiricalLikelihood ...... 32 2.9 Transformation and Frailty Models ...... 32 2.10 Semiparametric Regression Models ...... 33 2.11 ExtensionstoNon-i.i.d. Data ...... 34 2.12 Critiques and Possible Alternative Theories ...... 35 References...... 36 3. Efficient Estimator for Time Series Anton Schick and Wolfgang Wefelmeyer 45 3.1 Introduction...... 45 3.2 Characterization of Efficient Estimators ...... 47 3.3 Autoregression Parameter ...... 50 3.4 InnovationDistribution...... 52 3.5 InnovationDensity ...... 54 3.6 ConditionalExpectation ...... 55 3.7 StationaryDistribution...... 57 3.8 StationaryDensity ...... 58 3.9 TransitionDensity ...... 59 References...... 60

38 4. On the Efficiency of Estimation for a Single-index Model Yingcun Xia and Howell Tong 63 4.1 Introduction...... 63 4.2 Estimation via Outer Product of Gradients ...... 66 4.3 Global Minimization Estimation Methods ...... 68 4.4 Sliced Inverse Regression Method ...... 70 4.5 AsymptoticDistributions...... 71 4.6 Comparisons in Some Special Cases ...... 73 4.7 ProofsoftheTheorems...... 74 References...... 84 5. Estimating Function Based Cross-Validation M.J. van der Laan and Dan Rubin 87 5.1 Introduction...... 87 5.2 Estimating Function Based Cross-Validation ...... 90 5.3 SomeExamples ...... 96 5.4 GeneralFiniteSampleResult ...... 101 5.5 Appendix ...... 105 References...... 108

Part II. Nonparametric Methods

6. Powerful Choices: Tuning Parameter Selection Based on Power Kjell Doksum and Chad Schafer 113 6.1 Introduction: Local Testing and Asymptotic Power ...... 114 6.2 MaximizingAsymptoticPower...... 116 6.3 Examples ...... 129 6.4 Appendix ...... 134 References...... 139 7. Nonparametric Assessment of Atypicality Peter Hall and Jim W. Kay 143 7.1 Introduction...... 144 7.2 EstimatingAtypicality ...... 145 7.3 Theoretical Properties ...... 148 7.4 NumericalProperties ...... 151 7.5 OutlineofProofofTheorem7.1...... 157 References...... 160 8. Selective Review on Wavelets in Statistics Yazhen Wang 163 8.1 Introduction...... 163 8.2 Wavelets...... 164 8.3 Nonparametric Regression ...... 166 8.4 InverseProblems ...... 172 8.5 Change-points ...... 174 8.6 Local Self-similarity and Non-stationary Stochastic Process ...... 176 8.7 BeyondWavelets ...... 179 References...... 179 9. Model Diagnostics via Martingale Transforms: A Brief Review Hira L. Koul 183 9.1 Introduction...... 183 9.2 Lack-of-fitTests...... 197 9.3 Censoring ...... 201 9.4 Khamaladze Transform or Bootstrap ...... 202 References...... 203

39 Part III. Statistical Learning and Bootstrap

10. Boosting Algorithms: with an Application to Bootstrapping Multivariate Time Series Peter B¨uhlmann and Roman W. Lutz 209 10.1Introduction...... 209 10.2 Boosting and Functional Gradient Descent ...... 211 10.3 L2-Boosting for High-dimensional Multivariate Regression ...... 217 10.4 L2-Boosting for Multivariate Linear Time Series ...... 222 References...... 229 11. Bootstrap Methods: A Review S. N. Lahiri 231 11.1Introduction...... 231 11.2 Bootstrapfori.i.dData...... 233 11.3 ModelBasedBootstrap...... 238 11.4BlockBootstrap ...... 240 11.5SieveBootstrap ...... 243 11.6 Transformation Based Bootstrap ...... 244 11.7 Bootstrap for Markov Processes ...... 245 11.8 Bootstrap underLongRangeDependence ...... 246 11.9 BootstrapforSpatialData ...... 248 References...... 250 12. An Expansion for a Discrete Non-Lattice Distribution Friedrich G¨otze and Willem R. van Zwet 257 12.1Introduction...... 257 12.2 ProofofTheorem12.1 ...... 262 12.3 Evaluation of the Oscillatory Term ...... 271 References...... 273

Part IV. Longtitudinal Data Analysis

13. An Overview on Nonparametric and Semiparametric Techniques for Longitudinal Data Jianqing Fan and Runze Li 277 13.1Introduction...... 277 13.2 Nonparametric Model with a Single Covariate ...... 279 13.3 PartiallyLinearModels...... 283 13.4 Varying-Coefficient Models ...... 291 13.5 AnIllustration...... 293 13.6 Generalizations ...... 294 13.7 Estimation of Covariance Matrix ...... 296 References...... 299 14. Regressing Longitudinal Response Trajectories on a Covariate Hans-Georg M¨uller and Fang Yao 305 14.1 IntroductionandReview ...... 305 14.2 The Functional Approach to Longitudinal Responses ...... 311 14.3 Predicting Longitudinal Trajectories from a Covariate...... 313 14.4Illustrations ...... 316 References...... 321

40 Part V. Statistics in Science and Technology

15. Statistical Physics and Statistical Computing: A Critical Link James D. Servidea and Xiao-Li Meng 327 15.1 MCMC Revolution and Cross-Fertilization ...... 328 15.2TheIsingModel...... 328 15.3 The Swendsen-Wang Algorithm and Criticality ...... 329 15.4 Instantaneous Hellinger Distance and Heat Capacity ...... 331 15.5 A Brief Overview of Perfect Sampling ...... 334 15.6 Huber’s Bounding Chain Algorithm ...... 336 15.7 Approximating Criticality via Coupling Time ...... 340 15.8ASpeculation ...... 342 References...... 343 16. Network Tomography: A Review and Recent Develoments Earl Lawrence, George Michailidis, Vijayan N. Nair and Bowei Xi 345 16.1Introduction...... 346 16.2 PassiveTomography ...... 348 16.3 ActiveTomography ...... 352 16.4AnApplication ...... 359 16.5 ConcludingRemarks ...... 363 References...... 364

Part VI. Financial Econometrics

17. Likelihood Inference for Diffusions: A Survey Yacine A¨ıt-Sahalia 369 17.1Introduction...... 369 17.2 TheUnivariateCase ...... 371 17.3 Multivariate Likelihood Expansions ...... 378 17.4 Connection to Saddlepoint Approximations ...... 383 17.5 An Example with Nonlinear Drift and Diffusion Specifications ...... 386 17.6 An Example with Stochastic Volatility ...... 389 17.7 Inference When the State is Partially Observed ...... 391 17.8 Application to Specification Testing ...... 399 17.9 Derivative Pricing Applications ...... 400 17.10 Likelihood Inference for Diffusions under Nonstationarity ...... 400 References...... 402 18. Nonparametric Estimation of Production Efficiency Byeong U. Park, Seok-Oh Jeong, and Young Kyung Lee 407 18.1 TheFrontierModel ...... 407 18.2 EnvelopeEstimators ...... 409 18.3 Order-m Estimators...... 417 18.4 Conditional Frontier Models ...... 421 18.5Outlook ...... 423 References...... 424

Part VII. Parametric Techniques and Inferences

41 19. Convergence and Consistency of Newton’s Algorithm for Estimating Mixing Dis- tribution Jayanta K. Ghosh and Surya T. Tokdar 429 19.1Introduction...... 429 19.2 Newton’s Estimate of Mixing Distributions ...... 431 19.3 Review of Newton’s Result on Convergence ...... 432 19.4 ConvergenceResults ...... 433 19.5OtherResults ...... 438 19.6Simulation...... 440 References...... 442 20. Mixed Models: An Overview Jiming Jiang and Zhiyu Ge 445 20.1Introduction...... 445 20.2 LinearMixedModels ...... 446 20.3 Generalized Linear Mixed Models ...... 450 20.4 Nonlinear Mixed Effects Models ...... 455 References...... 460 21. Robust Location and Scatter Estimators in Multivariate Analysis Yijun Zuo 467 21.1Introduction...... 467 21.2 RobustnessCriteria...... 469 21.3 Robust Multivariate Location and Scatter Estimators ...... 473 21.4Applications...... 481 21.5 ConclusionsandFutureWorks...... 484 References...... 485 22. Estimation of the Loss of an Estimate Wing Hung Wong 491 22.1Introduction...... 491 22.2 Kullback-Leibler Loss and Exponential Families ...... 493 22.3 MeanSquareErrorLoss ...... 495 22.4 LocationFamilies ...... 496 22.5 ApproximateSolutions ...... 498 22.6 Convergence of the Loss Estimate ...... 502 References...... 506

Subject Index 507 Author Index 511

42 fÑxv|tÄ g{tÇ~á

go

Mary Beth Falke

Connie Brown

Zoya Kramer

and

Michael Bino, Lisa Glass, Noelina Hall , Kimberly Lupinacci

43