Determination of Exponential Parameters

Determination of Exponential Parameters K. Irene Snyder Wesley E. Snyder Center for Communications and Signal Processing Department Electrical and Computer Engineering North Carolina State University TR-91/11 August 1991 Determination of Exponential Parameters K. Irene Snyder Department of Psychology U niversity of North Carolina at Chapel Hill Wesley E. Snyder Dept of Radiology Bowman Gray School of Medicine Winston-Salem, NC The problem offinding the parameters ofexponential processes is discussed. Difficulties with numerical approaches based on gradient descent are illustrated. A new method for finding those parameters is presented, which requires virtually no manual intervention, and which can find globally optimal estimates lies precisely on a simple exponential curve. In this paper, we address the fundamental problem of finding the 1Introduction best estimate of the parameters of an exponential-type functions, given noisy measurements of that function. Researchers in perception and psycho The resulting technique is demon physics often find it necessary to plot strated here for simple exponential data points or measurements in loga functions, but is generally applicable rithmic curves against time. Displace to more complex functions such as ment threshold determina those described by Atkinson[2]. Given tion[31] , tilt aftereffect growth and an exponential function such as decay[12,21] contrast and light adap ex tation [37], vibrotactile sensitivity y=a+ be , [15], and many others produce data to find the values of the parameters a, that is best fit by logarithmic curves. band c which provide the best fit, we Stevenson's power law and Atkins vn's find the curve which minimizes the more general equation[2] show a gen error, or the distance between the data erally exponential/logarithmic rela points and the curve. We describe this tionship between sensation magnitude error by and stimulus intensity. ct Quantification of the physiological :~::)(yi- (a + be ;)]2 (1.1) processes underlying the obs~~ed be i havior often requires deterrmmng the parameters of an expon~nt~al func tion; however, a difficulty lies In deter- mining the best fitting curve fo~ da~a where the sum is taken over a set of on an exponential or Iogarithmic measurements. Here, E is referred to curve, since data unfortunately rarely as the means squared error. The easiest way to find the mini There are a number of problems inher mum of such an expression is to find ent in gradient descent-type methods: the zeroes of the derivatives, but dif • Parameter sensitivity The ferentiating this equation gives us ranges in the parameters, and their three deriva sensitivity to perturbations may tives: easily differ by orders of magni tude. For example, the parameter c described in the equations above is critical to the stability of the algo rithm, and very sensitive to small (1.2) errors. To compensate for this sen sitivity, we divide the gradient by the second derivative (in the scalar case, or by some norm of the Hes sian matrix for more complex prob lems). While use of the second de rivatives solves some of the sensitivity problem, it introduces a An algebraic solution to this simulta second problem: algebraic tedia. neous system of equations is intracta ble. Thus, we are forced to consider • Representational complexity To numerical methods. perform gradient descent, one must analytically evaluate first and sec ond partial derivatives. This alge 1.1Gradient Descent braic process, while straightfor ward, is tedious and prone to error. Fortunately, there are now a num Gradient descent, a standard nu ber of symbolic math software merical technique for optimization, packa, _3 which can be used to finds the minimum by stepping slowly automated this process. For the down the slope since the slope always many users who do not have easy points away from the minimum (or at access to such tools, the simple least, from some minimum). Given process of correctly differentiating some scalar function Eix), and a cur k complex expressions is tedious and rent estimate (at iteration kiJ · f i ) to time consuming at best. find the downhill point x(k+ ) gradier descent uses • n1~teaus Exponential functions 110. ve a notorious tendency to have plateaus in the corresponding MSE fit functions. For example, Figure 1 illustrates the error measure of Equation (1) plotted vs. c with a \ and b held at their optimal values. where E' and E" are the first and sec The slope of this curve becomes ar ond derivatives of E with respect x. bitrarily small as one moves to the left along- the c axis. If one choose a 2 stopping criterion for gradient de and may be described as follows: scent such as "stop ifthe magnitude of the gradient is less than T', it is 1. Choose (at random) an initial trivial to find such a point by mov value ofx. ing along the c axis away from the 2. Generate a point y which is a minimum. "neighbor" of x. • Local Minima 3. If E(y) < E(x), y becomes the new The primary flaw with gradient de value ofx. scent as a means of solving this type of minimization problem is 4. If E(y) > Et«), compute that (unless it gets caught on a pla Py=exp(-E(x)-E(y) IT). IfbPy > I! for R (a random num er um- teau), it finds the minimum ne: I. -st formly distributed between 0 and to the starting point, which mayor 1) then accept y. may not be the absolute minimum. 5. Decrease T slightly. 1.2 Simulated Annealing 6. IfT > Tmin' go to 2. The minimization technique known as In step 3, we perform a descent, so simulated annealing [19,1] avoids the that we always fall "downhill". In step problems oflocal minima and plateaus 4, we make it possible to sometimes move uphill and get out of a valley. In itially, we ignore T and note that if y represents an uphill move, the prob ability of accepting y IS proportional to e-(E(y) - E(x))IT . Thus, uphill moves can occur, but are exponentially less likely to occur as the size of the uphill move Lccomes larger. The likelihood of an uphill move is, however, strongly influenced hy T. If T is very large, all I moves will iJe accepted. As T is gradu I ally reduced, uphill moves become less likely until for low values of T, T«(E(y) - E(x)), such moves essen I tially can; !- occur. I I I In the case of combinatorial optimiza tion, where all the variables take on only one of a small number (usually two) of possible values, the "neighbor" of a vector x 1 is another vector x 2 such ~ that only one element of x is changed - to create .x')" For problems (such as fit -0.5 -x- 0 0 -y- 500 ting an exponential) where the ~ar;i Figure 1. abIes take on continuous values, It IS Fit error vs. exponerrti.rl parameter c 3 Continuous Optimization (tree annealing) much more difficult to quantify the where the d-dimensional vector x has concept of "neighbor". continuously-valued elements. Fur In the next section, we describe a thermore, we ass~e a bounded new minimization strategy, which search space S c9\ , which we will handles continuously-valued van represent with a dynamic data struc abIes. ture. We use a k-d tree in which each 2 Continuous Optimization (tree level of the tree represents a binary partition of one particular degree of annealing) freeriom(DOF). Each node may thus be interpreted as representing a hyper Although some work has been done rectangle, and its «hildren therefore on extending SA to problems with con represent the smal. hyperrectangles tinuous variables[36], the nature of resulting from dividing the parent the SA algorithm makes it best suited along one particular DOF. In Figure 2, for solving problems in combinatorial we illustrate a one-dimensional energy optimization, in which the variables function, and show how a resulting take on only discrete values. This is partition tree provides more resolution primarily due to the difficulty in speci in the vicinity of minima. fying, for a particular problem, pre cisely what the "neighborhood" of a In Figure 3, we illustrate a 2-D exam- continuously-valued variable is. We will not ~ in this paper, attempt to sur vey all the other (that is, not based on SA) methods for continuous optimiza tion. See [8] for more information. In this section, we discuss a method for finding the minimum of functions of continuously-: ilued vari abIes. We find it convenie ;0 think of the optimization problem a search: the minimum lies somewhere in a bounded hyperspace of dimension d. It is not practical to use any sort of array structure to store a representation of such a space, since the storage rapidly Figllre 2. A one dimensional search becomes prohibitive. Instead, we use a space and corresponding partition tree It is ti .~. important to remember that the tree is built The minimization method de using a random process therefore, the tree is scribed here, which we call "tree an likely to possess more depth (resolution) in nealing" is an extension of the familiar the vicinity of minima. Therefore this figure Metropolis [23]algorithm of simulated shows only what the tree is likely to be. annealing, but handles continuously ple of the partition tree and how nodes valued variables in a natural way. in the tree correspond to successive We assume we are searching for the minimum of some function Hix) refinements. that g is symmetric in the following I=Ok=O sense: g(y I x) = g(x I y ).

Load more