Idescat. SORT. Twenty Years of P-Splines. Volume 39
Total Page:16
File Type:pdf, Size:1020Kb
Statistics & Operations Research Transactions Statistics & Operations Research SORT 39 (2) July-December 2015, 149-186 © Institut d’Estad´ısticaTransactions de Catalunya ISSN: 1696-2281 [email protected] eISSN: 2013-8830 www.idescat.cat/sort/ Twenty years of P-splines Paul H.C. Eilers1, Brian D. Marx2 and Maria Durb´an3 Abstract P-splines first appeared in the limelight twenty years ago. Since then they have become popular in applications and in theoretical work. The combination of a rich B-spline basis and a simple dif- ference penalty lends itself well to a variety of generalizations, because it is based on regression. In effect, P-splines allow the building of a “backbone” for the “mixing and matching” of a variety of additive smooth structure components, while inviting all sorts of extensions: varying-coefficient effects, signal (functional) regressors, two-dimensional surfaces, non-normal responses, quantile (expectile) modelling, among others. Strong connections with mixed models and Bayesian analy- sis have been established. We give an overview of many of the central developments during the first two decades of P-splines. MSC: 41A15, 41A63, 62G05, 62G07, 62J07, 62J12. Keywords: B-splines, penalty, additive model, mixed model, multidimensional smoothing. 1. Introduction Twenty years ago, Statistical Science published a discussion paper under the title “Flex- ible smoothing with B-splines and penalties” (Eilers and Marx, 1996). The authors were two statisticians with only a short track record, who finally got a manuscript published that had been rejected by three other journals. They had been trying since 1992 to sell their brainchild P-splines (Eilers and Marx, 1992). Apparently it did have some value, because two decades later the paper has been cited over a thousand times (according to the Web of Science, a conservative source), in both theoretical and applied work. By now, P-splines have become an active area of research, so it will be useful, and hopefully interesting, to look back and to sketch what might be ahead. 1 Erasmus University Medical Centre, Rotterdam, the Netherlands, [email protected] 2 Dept. of Experimental Statistics, Louisiana State University, USA, [email protected] 3 Univ. Carlos III Madrid, Dept of Statistics, Legan´es, Spain, [email protected] Received: October 2015 150 Twenty years of P-splines P-splines simplify the work of O’Sullivan (1986). He noticed that if we model a function as a sum of B-splines, the familiar measure of roughness, the integrated squared second derivative, can be expressed as a quadratic function of the coefficients. P-splines go one step further: they use equally-spaced B-splines and discard the derivative com- pletely. Roughness is expressed as the sum of squares of differences of coefficients. Dif- ferences are extremely easy to compute and generalization to higher orders is straight- forward. The planof the paperis asfollows. In Section2 we startwith a description of basic P- splines, the combination of a B-spline basis and a penalty on differences of coefficients. The penalty is the essential part, and in Section 3 we present many penalty variations to enforce desired properties of fitted curves. The penalty is tuned by a smoothing pa- rameter; it is attractive to have automatic and data-driven methods to set it. Section 4 presents model diagnostics that can be used for this purpose, emphasizing the important role of the effective model dimension. We present the basics of P-splines in the context of penalized least squares and errors with a normal distribution. For smoothing with non-normal distributions, it is straight-forward to adapt ideas from generalized linear models, as is done in Section 5. There we also lay connections to GAMLSS (generalized additive models for location, scale and shape), where not only the means of conditional distributions are modelled. We will see that P-splines are also attractive for quantile and expectile smoothing. The first step towards multiple dimensions is the generalized addi- tive model (Section 6). Not only can smoothing be used to estimate trends in expected values (and other statistics), but it also can be used to find smooth estimates for regres- sion coefficients that change with time or another additional variable. The prototypical case is the varying-coefficient model (VCM). We discuss the VCM in Section 7, along with other models like signal regression. In modern jargon these are examples of func- tional data analysis. In Section 8, we take the step to full multidimensional smoothing, using tensor products of B-splines and multiple penalties. In Section 9, we show how all the models from the previous sections can be added to each other and so combined into one structure. Here again the roots in regression pay off. One can appreciate the penalty as just a powerful tool. Yet it is possible to give it a deeper meaning. In Section 10, P-splines are connected to mixed models. This leads to further insights, as well as to new algorithms for finding reasonable values for the penalty parameters. From the mixed model perspective, it is just a small step to a Bayesian approach, interpreting the penalty as (minus) the logarithm of the prior distribution of the B-spline coefficients. This is the subject of Section 11. Asymptotics and boosting do not have a natural place in other sections, so we put them together in Section 12, while computational issues and availability of software are discussed in Section 13. We close the paper with a discussion As far as we know, this is the first review on P-splines. Earlier work by Ruppert et al. (2009) took a broader perspective, on the first five years after appearance of their book (Ruppert et al., 2003). We do not try to be exhaustive. That would be impossible (and boring), given the large number of citations. With the availability of Google Scholar Paul H.C. Eilers, Brian D. Marx and Maria Durban´ 151 and commercial citation databases such as Scopus and the Web of Science, anyone can follow the trail through history in detail. We have done our best, in good faith, to give an overview of the field, but we do not claim that our choice of papers is free from subjectivity. The advent of P-splines has led to formidable developments in smoothing, and we have been actively shaping many of them. We hope that we will not offend any reader by serious omissions. 2. P-spline basics The two components of P-splines are B-splines and discrete penalties. In this section we briefly review them, starting with the former. We do not go much into technical detail; see Eilers and Marx (2010) for that. 2.1. B-splines Figure 1 shows four triangles of the same height and width, the middle ones overlapping with their two neighbours. These are linear B-splines, the non-zero parts consisting of two linear segments. Imagine that we scale the triangles by different amounts and add them all up. That would give us a piecewise-linear curve. We can generate many shapes by changing the coefficients, and we can get more or less detail by using more or fewer B-splines. If we indicate the triangles by B j(x) and if a1 to an are the scaling coefficients, n we have j=1 a jB j(x) as the formula for the function. This opens the door to fitting data pairs (xi,yi) for 1,...,m. We minimize the sum of squares P 2 2 S = (yi a jB j(xi)) = y Ba , − || − || i j X X Individual linear B−splines (offset vertically) 0.0 0.5 1.0 1.5 0 1 2 3 4 5 Figure 1: Linear B-splines illustrated. The individual splines are offset for clarity. In reality the horizontal sections are zero. 152 Twenty years of P-splines Individual quadratic B−splines (offset vertically) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 1 2 3 4 5 6 7 Figure 2: Quadratic B-splines illustrated. The individual splines are offset for clarity. In reality the hori- zontal sections are zero. where B = [bi j], the so-called basis matrix. This is a standard linear regression problem T 1 T and the solution is well known: aˆ =(B B)− B y. The flexibility can be tuned by changing the width of the triangles (and hence their number). A piecewise-linear fit to the data may not be pleasing to the eye, nor be suitable for computing derivatives (which would be piecewise-constant). Figure 2 shows quadratic B-splines, each formed by three quadratic segments. The segments join smoothly. In a similar way cubic B-splines can be formed from four cubic segments. The recipe for forming a curve and fitting the coefficients to data stays the same. The positions at which the B-spline segments join are called the knots. In our illus- trations the knots are equally-spaced and so all B-splines have identical shapes. This is not mandatory for general B-splines, but rather it is a deliberate choice for P-splines, as it makes the construction of penalties trivial. One should take care when computing the B-splines. The upper panel of Figure 3 shows a basis using equally-spaced knots. Note the “incomplete” B-splines at both ends, of which not all segments fall within the domain of x. The lower panel shows a basis as computed by the R function bs(). It has so-called multiple knots at both ends and therefore is unsuitable for P-splines. To avoid this, one should specify an enlarged domain, and cut off the splines at both ends, by removing the corresponding columns in the basis matrix.