The Use of Matrix Methods in the Modeling of Spectroscopic Data Sets
Total Page:16
File Type:pdf, Size:1020Kb
652 Biophysical Journal Volume 72 February 1997 652-673 The Use of Matrix Methods in the Modeling of Spectroscopic Data Sets Eric R. Henry Laboratory of Chemical Physics, National Institutes of Health, Bethesda, Maryland 20892-0520 ABSTRACT We describe a general approach to the model-based analysis of sets of spectroscopic data that is built upon the techniques of matrix analysis. A model hypothesis may often be expressed by writing a matrix of measured spectra as the product of a matrix of spectra of individual molecular species and a matrix of corresponding species populations as a function of experimental conditions. The modeling procedure then requires the simultaneous determination of a set of species spectra and a set of model parameters (from which the populations are derived), such that this product yields an optimal description of the measured spectra. This procedure may be implemented as an optimization problem in the space of the (possibly nonlinear) model parameters alone, coupled with the efficient solution of a corollary linear optimization problem using matrix decomposition methods to obtain a set of species spectra corresponding to any set of model parameters. Known species spectra, as well as other information and assumptions about spectral shapes, may be incorporated into this general framework, using parametrized analytical functional forms and basis-set techniques. The method by which assumed relationships between global features (e.g., peak positions) of different species spectra may be enforced in the modeling without otherwise specifying the shapes of the spectra will be shown. We also consider the effect of measurement errors on this approach and suggest extensions of the matrix-based least-squares procedures applicable to situations in which measurement errors may not be assumed to be normally distributed. A generalized analysis procedure is introduced for cases in which the species spectra vary with experimental conditions. INTRODUCTION The understanding of biochemical processes often relies on might predict the following such dependence: the use of models that describe the underlying mechanisms in terms of interconversions among hypothetical molecular species. The study of these processes using spectroscopic c([X]) = + X] CBX([X]) = + X] (2) methods then requires that the observed spectra be related to the populations and intrinsic spectra of these model species. where co is the (known) total concentration of both species A common example of such modeling appears in the anal- and K is an equilibrium constant that remains to be deter- ysis of spectrophotometric titrations. In a simple case, the mined. We would then attempt to estimate the parameter K binding of a ligand X to a molecule B-i.e., the intercon- and the species spectra SB and SBX by performing a set of version between two molecular species B <-* BX-is mon- simultaneous least-squares fits using the measured spectral itored by measuring spectra A(A,[X]) as functions of wave- amplitudes at different wavelengths as a function of [X]: length A, and the concentration of ligand [X]. If the spectra Ai of the two pure species, SB(A) and SBX(A), are known a CO(SB(Ai) + K[X]SBX(Ai)) priori, then the species concentrations cB([X]) and cBx([X]) A(Ai,A(A1,[XI)[]) ± (3) are determined in a "model-free" fashion by performing a -~ I KIX](3 least-squares fit Here the amplitudes SB(Ai) and SBX(Ai) of the species spec- A(A, [X]) z CB([X])SB(A) + CBX([X])SBX(A) (1) tra at wavelength Ai, as well as the equilibrium constant K, are treated as adjustable parameters, and the value of K is independently at each value of [X]. If, on the other hand, the required to be consistent among all of the single-wavelength species spectra are not known, it is still possible to deter- fits. mine both the species concentrations and the species spectra The solution of such a spectroscopic modeling problem from a set of spectra {A} measured for different values of generally requires some procedure to vary both the spectra [X], provided some model is assumed for the dependence of of the species and the adjustable parameters in the model, the species concentrations on [X]. For this titration, we from which the populations of the species as a function of experimental conditions are computed, to reproduce the observed variations of the measured spectra with experi- mental conditions. Because the model parameter K in the Received for publication 12 August 1996 and in final form 8 November of the set of 1996. above example is independent wavelength, fits 3 uses the same model Address reprint requests to Dr. Eric R. Henry, National Institutes of Health, least-squares represented by Eq. Bldg. 5, Room 110, Bethesda, MD 20892-0520. Tel.: 301-496-6031; Fax: species populations at each wavelength. By arranging the 301-496-0825; E-mail: [email protected]. measured spectra for different values of [X] as the columns C 1997 by the Biophysical Society of a matrix, we can separate out the unknown species 0006-3495/97/02/652/22 $2.00 spectra from the model species populations. The set of Henry Matrix-Based Modeling of Spectra 653 simultaneous least-squares fits in Eq. 3 may then be written the spectroscopic modeling problem is presented in A Gen- eralized Spectroscopic Modeling Problem, which with some A(Xk, [X]i) ... A(Am, [X]n) ) assumptions allows the species spectra (rather than the populations alone) to vary with experimental conditions. Detailed mathematical developments in support of the dis- A(A,III [X]I) . .. A(Akmg [X]n) (4) cussion are presented in three appendices at the end of the article. SB(Al) SBX(Al) ) PB(K, [X],) ... pB(K, [X].) PBX(K, [X]1) ... PBX(K, [X].) SB(Amn) SBx(Am) DEFINITION AND FEATURES OF THE SPECTROSCOPIC MODELING PROBLEM We consider a general set of spectra {Ai} (e.g., optical where pi(K, [X]) is the population of species i predicted by absorption or Raman spectra) which have been obtained for the model for equilibrium constant K and ligand concentra- a system under various sets of measurement conditions, and tion [X]. Being able to write the problem in matrix form is assume that these spectra are functions of a single spectro- more than a notational convenience. Procedures for solving scopic parameter A (e.g., wavelength). The set of spectra such modeling problems may, in many cases, exploit this may be written in the compact form {Ai} = {A(A, {xi})}, matrix organization of spectra and populations to consider- where {xi} is the set of measurement conditions, one for able advantage, in terms of both algorithmic efficiency and each spectrum. When several experimental conditions are the compactness of the stored representation of the data being varied, each xi may be a set of values of distinct (Henry and Hofrichter, 1992). In this paper we describe an experimental parameters, e.g., xi = {time(i), pH(i), tempera- approach to the solution of the spectroscopic modeling ture(i),... }. In the general modeling problem, a set of problem that is built upon the concepts and procedures of model "species" is postulated, as well as some prescription matrix analysis. It should be emphasized that a properly pj = Mj({xi}, { k}) for computing the population of species constructed matrix-based modeling procedure should be j as a function of the experimental parameters {xi} and some equivalent in all meaningful ways to a "global analysis" set of model parameters {f k}. Associated with each species approach (Beechem, 1992) as embodied in Eq. 3, which j is a spectrum Sj(A), which is assumed not to vary with does not rely upon a matrix organization of the data. It is experimental conditions. (This assumption will be relaxed hoped that the efficiencies offered by a matrix viewpoint somewhat in A Generalized Spectroscopic Modeling Prob- will make demanding analyses of larger data sets more lem.) Measured spectra must be written as sums of the practical. species spectra weighted by the respective species popula- Some of the ideas and techniques presented here are not tions, that is new. Our goal in this paper is to integrate existing ideas with certain new techniques and perspectives, which have been N A(A, (5) developed in treating specific modeling problems, into a {xi}) j=lE Sj(A)Mjj({xi}, {&k}) "handbook" that might prove useful to others. The proce- dures presented here have been applied most recently to the To proceed, we arrange the set of scalars {A(A, {xi})}, analysis of time-resolved optical absorption spectra of pho- measured for a fixed set of n. spectroscopic parameter tolyzed hemoglobin using complex kinetic models (Henry values A, in the form of a matrix A. Each column of A is a et al., 1997). However, the present discussion has been kept single measured spectrum, so that each row of A corre- as general as possible and makes no direct assumptions sponds to a single fixed A, and each column is identified about the types of spectra and models being considered. with one of np sets of experimental parameters xi: The organization of the paper is as follows: The spectro- scopic modeling problem in matrix form and outline of the A(Al, {Xl}) A(A1, {X2}) ..A(Ak , {Xnpl) general approach to solving it are introduced in Definition and Features and Direct versus SVD-Based Analysis sec- A(A2, {x1}) A(A2, {X2}) ... A(A2, {xnp}) A A= tions. In Incorporating Known Species Spectra we discuss (6) ... ... .. methods to incorporate known or assumed features of spe- cies spectra into the matrix-based a {X1} modeling priori. More A(Akn,, ) A(Ank, {x2}) ..A(AnA, {Xnpl)I careful consideration is then given in Spectroscopic Mod- eling to the role of measurement errors in defining the In a similar fashion we define a n.