Projections Onto Order Simplexes and Isotonic Regression
Total Page:16
File Type:pdf, Size:1020Kb
Volume 111, Number 2, March-April 2006 Journal of Research of the National Institute of Standards and Technology [J. Res. Natl. Inst. Stand. Technol. 111, 000-000(2006)] Projections Onto Order Simplexes and Isotonic Regression Volume 111 Number 2 March-April 2006 Anthony J. Kearsley Isotonic regression is the problem of appropriately modified to run on a parallel fitting data to order constraints. This computer with substantial speed-up. Mathematical and Computational problem can be solved numerically in an Finally we illustrate how it can be used Science Division, efficient way by successive projections onto to pre-process mass spectral data for National Institute of Standards order simplex constraints. An algorithm automatic high throughput analysis. for solving the isotonic regression using and Technology, successive projections onto order simplex Gaithersburg, MD 20899 constraints was originally suggested and Key words: isotonic regression; analyzed by Grotzinger and Witzgall. This optimization; projection; simplex. algorithm has been employed repeatedly in [email protected] a wide variety of applications. In this paper we briefly discuss the isotonic Accepted: December 14, 2005 regression problem and its solution by the Grotzinger-Witzgall method. We demonstrate that this algorithm can be Available online: http://www.nist.gov/jres 1. Introduction seasonal effect constitute the mean part of the process. The important issue of mean stationarity is usually the Given a finite set of real numbers, Y ={y1,..., yn}, first step for statistical inference. Researchers have the problem of isotonic regression with respect to a developed a testing and estimation theory for the complete order is the following quadratic programming existence of a monotonic trend and the identification of problem: important seasonal effects. The associated statistical n minimize∑ wx (− y ) 2 inference and probabilistic diagnostics result from xiiii=1 (1) solving a problem generically called the change-point ≤≤ subject toxx1 n , problem. This change-point problem initially arose in quality control assessment. It includes, for example, where the wi are strictly positive weights. Many impor- the testing for changes in weather patterns and disease tant problems in statistics and other disciplines can be rates. Isotonic regression is necessary to test and posed as isotonic regression problems. In epidemiolo- estimate the trend and to determine periodic compo- gy, binary longitudinal data are often collected in clini- nents (see Ref. [15]). For this application the isotonic cal trials of chronic diseases when interest is on assess- regression yields estimators for the long-time trend ing the effect of a treatment over time. Transitional with negligible influence from the seasonal effect, models for longitudinal binary data subject to non- a desirable property. Recently, the development of ignorable missing data can be developed but parameter algorithms for automatic peak and trough detection in estimation must be done in conjunction with an iso- raw mass spectrometer data have become an active tonic regression (see Ref. [3]). In classical time series research area. Motivated by the increased ability to analysis applied to global climate prediction, complex produce high-quality data quickly, these algorithms processes are often modelled as three additive compo- have become crucial in cost-effective analysis of data nents: long-time trend, seasonal effect and background (see Ref. [12, 13]). One difficulty in applying automat- noise. The long-time trend superimposed with the ic structure revealing algorithms to raw spectroscopy 1 Volume 111, Number 2, March-April 2006 Journal of Research of the National Institute of Standards and Technology data is the failure of data to be monotonic (usually as a to the interval [0, Wn] by linear interpolation. Both the function of mass or time). This can occur for many function and its graph are called the cumulative sum reasons including truncation and roundoff error, diagram (CSD) of the isotonic regression problem. machine calibration errors, ...etc. In this case, isotonic The greatest convex minorant (GCM) of a function f regression has become an important pre-processing is the convex function defined by step before data analysis. Clearly the isotonic regression problem is an opti- GCM[ f ]=≤ sup{φφ : convex, φ f }. mization problem encountered frequently when dealing with data generated with any uncertainty as is the It is a well-known and beautiful result that the iso- * case in many application problems in science and tonic regression problem is solved by taking xj to engineering. In their respective monographs Barlow, be the left derivative of GCM [CSD] at Wj . Thus, theo- Bartholomew, Bremner, and Brunk and Robertson, rems about isotonic regressions can be stated and Wright, and Dykstra have written comprehensive proved as theorems about greatest convex minorants. surveys of this subject (see Refs. [1, 9]). The ideas of the work presented here are based on convex analysis. A comprehensive survey of convex 1.1 A Simple Example analysis can be found in Ref. [10]. Even though only In this section we describe a simple example of the elementary tools need be employed to prove this isotonic regression problem. Suppose that {1, 3, 2, 4, 5, theorem, it has profound implications for parallel 7, 6, 8} is a given set of real numbers, and that all the computation. Suppose that we decompose the set Y into ⊕ weights associated with these numbers are all identical- Y1 Y2, where Y1 ={y1,..., yk } and Y2 ={yk +1,...,yn}. ly one. This set is almost isotonic; however, {3,2} and Analogously, we can decompose a function f with ⊕ {7,6} violate the nondecreasing requirement. One domain [0,Wn] into f1 f2, where f1 is the restriction of simple solution to this difficulty can be constructed by f to [0,Wk] and f2 is the restriction of f to (Wk,Wn]. Then replacing each “block” of “violators” with the average the following result is easily demonstrated. of the numbers in the block. This produces {1, 2.5, 2.5, ⊕ 4, 5, 6.5, 6.5, 8}, which turns out to be the unique Theorem 1 GCM [GCM[f1] GCM[f2]]=GCM[f] solution of the isotonic regression problem. This is an example of the well-known “Pool Adjacent Violators” Proof: ≤ ≤ algorithm. One of the best-known examples of a pool Since GCM[f1] f1 and GCM[f2] f2, ⊕ ≤ ⊕ adjacent violators algorithm for solving the isotonic GCM[f1] GCM[f2] f1 f2 = f. regression problem is due to Grotzinger and Witzgall φ ≤ ⊕ (Ref. [4]). In this important paper, the authors propose It follows that, if GCM[f1] GCM[f2], an efficient pool adjacent violator method for identify- then φ ≤ f, and hence that ⊕ ≤ ing data points that violate the order constraint and, in GCM[GCM[f1] GCM[f2]] GCM[f]. turn, projects the violators onto simplex representations of the order constraints. Conversely, suppose that φ ≤ f is convex and write φ φ ⊕ φ φ ≤ φ ≤ φ ≤ In this work, the formulation of the Pool Adjacent = 1 2. Then 1 f1 and 2 f2, so 1 GCM[f1] and φ ≤ φ ≤ ⊕ Violators algorithm due to Grotzinger and Witzgall has 2 GCM[f2]. It follows that GCM[f1] GCM[f2], been altered slightly to run specifically on parallel and hence that computers. This algorithm has been implemented and GCM[] f≤⊕ GCM [ GCM [] f GCM []]. f (3) tested on parallel machines (Ref. [7]). 11 2. A Decomposition Theorem Combining inequalities (2) and (3) gives the desired result. To attain the necessary rigor, we exploit a famous and very elegant characterization of the solution to 2.1 Implications for Parallel Computation j If one takes the function f to be the CSD for the iso- the isotonic regression problem. WwP= ∑ ,let jji=1 0 tonic regression problem, then Theorem 1 states the fol- ⊕ denote the point (0, 0), and let Pj denote the point lowing: decomposing Y into Y1 Y2, performing sepa- j rate isotonic regressions on Y and Y , and then, per- (Wwyjn ,∑ ), for= 1,…… , . We interpret PP , , 1 2 jiii=1 0 nforming a final isotonic regression on the combined as points in the graph of a function, which we extend result, produces the isotonic regression on Y. Because 2 Volume 111, Number 2, March-April 2006 Journal of Research of the National Institute of Standards and Technology the separate isotonic regressions on Y1 and Y2 can be singleton values, in which case nothing whatsoever has performed simultaneously, parallel computations of been accomplished. Furthermore, the more that the isotonic regressions will be desirable if the final iso- original problem is decomposed, the greater the com- tonic regression on the combined result is easy to com- munication costs of parallelization. Hence, it is impos- pute. In point of fact, this is the case. Suppose that Y1 sible to anticipate the most efficient decomposition ≤ ≤ ≤ ≤ satisfies y1 ··· yk and Y2 satisfies yk +1 ··· yn. If strategy. ≤ yk yk +1, then Y is isotonic. If Y is not isotonic, then it must be because some of the largest numbers in Y1 3. Application to the Analysis of Mass exceed some of the smallest numbers in Y2. The anti- Spectral Data dote to this difficulty is to identify this central block of offending numbers and to replace each of these num- Modern mass spectrometers are capable of produc- bers with the weighted average of the block. (This is ing large, high-quality data sets in brief periods of time just the Pool Adjacent Violators algorithm again.) To (Ref. [14]). It is not uncommon for a synthetic polymer accomplish this, let to produce a spectra with hundreds of peaks. This moti- vates the design of automated data analysis algorithms miyy=>min { : } , ik+1 capable of rapid and repeatable processing of raw mass =< Miyymax { :ik } , spectrometer data. While many algorithms for the and analysis of raw mass spectrometer already exist, they MM all require significant operator input.