Focustat: Focus Driven Statistical Inference with Complex Data
Total Page:16
File Type:pdf, Size:1020Kb
FocuStat: Focus Driven Statistical Inference with Complex Data CONTENT: FocuStat: The Task Force 2 The Tasks and Themes 3 The Task Focused Statistical Methods 4 Force Statistical Stories 12 Workshops and the VVV Conference 20 The FocuStat Research Kitchens 26 VIPs, Guests, Collaborators, Networking 28 FocuStat Excursions 30 The FocuStat Blog 32 Awards & Prizes 35 FocuStat in the Media 36 The core group, at a book discussion evening: G.H. Hermansen, E.Aa. Stoltenberg, V. Ko, Talks & Presentations 40 S.-E. Walker, B. Cuda, C. Cunen, K.H. Hellton, N.L. Hjort. Publications 44 FocuStat after FocuStat 48 FOCUSTAT HAS BEEN a five- 2016, 2018; Ko, Stoltenberg year project within the Statistics and Walker will defend theirs Division of the Department of in 2019. After their PostDoc Mathematics, University of Oslo terms, Hellton is a senior (UiO), funded by the Research researcher with the Norwegian Council of Norway, operating Computing Centre, Herman- from January 2014 to December sen a First Amanuensis with 2018. The directly funded core the BI Norwegian Business group has consisted of Professor School; also, Cunen is from 2019 Nils Lid Hjort (project leader), onwards a PostDoc with the Kristoffer Herland Hellton and Department of Mathematics, Gudmund Horn Hermansen UiO. Former Master students (PostDocs), Céline Cunen and Josephina Argyrou, Jonas Moss, Sam-Erik Walker (PhDs). Riccardo Parviero, Leiv Tore Salte Rønneberg (again, super- SEVERAL ASSOCIATED PhD vised or co-supervised by Hjort, students and Master students, and each of them having taken along with further colleagues up PhD positions afterwards) and collaborators, have also have worked with FocuStat been close to the core group. In related themes, as have Martin particular, PhD students Martin Tveten and yet others. For a Jullum, Vinnie Ko, Emil Aas full list of associated members, Stoltenberg (each supervised consult the Who We Are section or co-supervised by Hjort) at our website. Cover photo: Paul A. Røstad, 1967. have been active participants. Design & layout: geriljahansen.no Hermansen, Jullum, Cunen January 2019 defended their PhDs in 2014, 2 INTRODUCTION FocuStat: M3 M0 The M5 estimate M4 confidence curves M2 Tasks and M1 −0.0076 −0.0072 −0.0068 0.0 0.2 0.4 0.6 0.8 1.0 0 1234 0.00235 0.00240 0.00245 0.00250 Themes γ FIC To the left, a collection of confidence curves, with placebo versus drug studies. The two fatter curves in the middle relate to optimal combination of data information. To the right, a FIC plot, which evaluates and ranks different candidate models for a given purpose, related here to the slimming of minke whales over time. The more to the left, the better is the candidate model. STATISTICS IS THE science of elling for new types of analysis, and us in the same sub-project, we have reaching decisions under uncer- potentially also for new concepts of benefitted strongly from our inter- tainty and is in many respects a information and inference. Themes national workshops and research far-ranging success story, permeat- and challenges we have addressed kitchens. We have had such theme- ing nearly all substantive sciences share the concept of the focus, the based three-day workshops during and areas of society where data operational view that the science and the spring semesters of 2015, 2016, are collected. It has used around context drive the most important 2017, with about 25 participants for hundred years to reach its present questions which again should influ- each, and smaller-scaled three-day state of high maturity and uniform ence the optimal combinations of research kitchens during the autumn usefulness. In broad strokes, the models, their analysis, and the ensu- semesters of 2014, 2015, 2016, 2017, four main areas associated with ing decisions. Three such challenges 2018, with up to a dozen partici- are as follows: pants for each. Also, in May 2018 (1) parametrics (models indexed we organised a somewhat bigger by low-dimensional parame- A: “breaking the wall” between and broader four-day summing-up ters), areas (1) and (2), partly leaning conference, with about 45 scholars (2) nonparametrics (models with on recent advances inside area taking part. high- or infinite-dimensional (3). parameters), B: extending the current scope THE PRESENT REPORT sum- (3) assessment and selection of and catalogue of approaches marises and documents some of models, and methods of relevance to our efforts and achievements. More (4) combination of different area (4), including the use of information and further details sources of information confidence distributions. may be found at the FocuStat web- C: extending and developing new site, which will be kept active and drive most of modern statistics, and methodologies for areas (1) and growing also after the formal end have, in essence, been well sorted (3) for the by now frequently of the 2014-2018 project. Impor- out, conceptually and operationally. occurring situations where the tantly, our project has involved There are important gaps to be filled number of measurements per work on two fronts, so to speak, and new paradigms and principles to individual exceeds the sample with efforts directed both towards develop, however, when faced with size. developing methodology and the statistical challenges of the 21st towards applying these for substan- century. New types of data, related THE RESEARCH AREAS alluded tive questions in real-world appli- both to new types of substantive to here are wide and the soil is rich, cation areas. Communication, the questions in a changing society and with many fruitful tasks worthy task of conveying and interpreting to evolving technologies for monitor- of detailed study, regarding both both methodology and application ing and examining more complicated methodology and applications. In stories to a wider-than-statistics phenomena than earlier, create a addition to our individual efforts, audience, has also been a crucial need for new types of statistical mod- sometimes involving two or three of dimension. 3 Focused Statistical Methods Our FocuStat efforts have had at least two general but correlated dimensions and directions: developing and extending methodology, on one side, and applying the machinery for vital and fruitful application stories, on the other. In this section we give brief one-page summaries of certain methodological endeavours we’ve contributed to, in different focused directions. Some of our journal publications by necessity contain mathematical theorem-proof parts, for the specialists, but here we aim at conveying the basic motivation and usefulness of our ideas. These include model building and model selection, confidence curves, change points, personalised medicine, copulae for multivariate dependence, cure models, and robustness. FOCUSED STATISTICAL METHODS Céline Cunen: Confidence Curves for Dummies SOME FOCUSTAT MEMBERS confidence may be read off too; and have devoted efforts to developing we see the clear asymmetry in how 1.0 and popularising the concept of the confidence is distributed. In many confidene curve. The confidence other cases the confidence curve curve is an inferential summary. is symmetric, but here the distri- It presents the main result of a bution of the statistic informing confidence statistical analysis, for each of the us about the size of σ has a skewed 0.0 0.2 0.4 0.6 0.8 parameters we might be particu- distribution. The cc(σ) can also be 2 4 6 8 larly interested in, and constitutes used to read off p-values for tests. σ a foundation from which the user Confidence curve for σ. We read off the median confidence point estimate can draw real-world conclusions to AT THE BASIC level the confi- at 3.226, along with e.g. a 95 percent quantitative problems. dence curve is simply a plot of confidence interval, [1.879, 7.380]. Intervals at other levels of confidence nested confidence intervals at all can be read off too. ALL QUANTITATIVE RESEARCH- levels, from 0 to 100 percent. The ERS are familiar with point plot conveys a lot of information estimates, confidence intervals, quickly and is easy to understand. THE CONFIDENCE CURVE con- and p-values. A point estimate is Some users will interpret this as a stitutes a step away from classical an intelligent guess of the correct distribution of likely values, after frequentist testing, where the typ- value of the parameter in question; seeing the data; it is the canonical ical result is simply a decision on a confidence interval gives a set of frequentist cousin to the Bayesian’s whether to reject the null hypothe- likely values; a p-value allows the posterior density (but there is no sis or not, based on some 0.05 type user to conclude a hypothesis test. prior camel to swallow). It may also significance rule. The confidence The confidence curve offers these have a positive probability mass at curve promotes transparency: things, and more. a boundary point of the parameter given the same model, data, and space, e.g. at zero for a variance assumptions, different users should HERE’S A SIMPLE example. parameter. arrive at the same confidence You observe the data points 4.09, curve, but might nonetheless draw 6.37, 6.89, 7.86, 8.28, 13.13 from CONFIDENCE CURVES ARE different real-world conclusions a normal distribution and wish particularly fruitful in the context in the end. At any rate the use of to assess the underlying spread of meta-analysis, where several confidence curves should discour- parameter, the standard deviation sources inform on the same param- age the automatic use of statistical σ. The confidence curve, say cc(σ), eter of interest, see the figure on methods and hopefully encourage sums up what can be told from the page 3. There we display a confi- critical thinking.