<<

Determination of Exponential Parameters

K. Irene Snyder Wesley E. Snyder

Center for Communications and Signal Processing Department Electrical and Computer North Carolina State University

TR-91/11 August 1991 Determination of Exponential Parameters

K. Irene Snyder Department of Psychology U niversity of North Carolina at Chapel Hill

Wesley E. Snyder Dept of Radiology Bowman Gray School of Medicine Winston-Salem, NC

The problem offinding the parameters ofexponential processes is discussed. Difficulties with numerical approaches based on descent are illustrated. A new method for finding those parameters is presented, which requires virtually no manual intervention, and which can find globally optimal estimates

lies precisely on a simple exponential curve. In this paper, we address the fundamental problem of finding the 1Introduction best estimate of the parameters of an exponential-type functions, given noisy measurements of that . Researchers in perception and psycho­ The resulting technique is demon­ often find it necessary to plot strated here for simple exponential data points or measurements in loga­ functions, but is generally applicable rithmic curves against time. Displace­ to more complex functions such as ment threshold determina­ those described by Atkinson[2]. Given tion[31] , tilt aftereffect growth and an exponential function such as decay[12,21] contrast and light adap­ ex tation [37], vibrotactile sensitivity y=a+ be , [15], and many others produce data to find the values of the parameters a, that is best fit by logarithmic curves. band c which provide the best fit, we Stevenson's power law and Atkins vn's find the curve which minimizes the more general equation[2] show a gen­ error, or the distance between the data erally exponential/logarithmic rela­ points and the curve. We describe this tionship between sensation magnitude error by and stimulus intensity.

ct Quantification of the physiological :~::)(yi- (a + be ;)]2 (1.1) processes underlying the obs~~ed be­ i havior often requires deterrmmng the parameters of an expon~nt~al func­ tion; however, a difficulty lies In deter- mining the best fitting curve fo~ da~a where the sum is taken over a set of on an exponential or Iogarithmic measurements. Here, E is referred to curve, since data unfortunately rarely as the means squared error. The easiest way to find the mini­ There are a number of problems inher­ mum of such an expression is to find ent in gradient descent-type methods: the zeroes of the , but dif­ • Parameter sensitivity The ferentiating this equation gives us ranges in the parameters, and their three deriva­ sensitivity to perturbations may tives: easily differ by orders of magni­ tude. For example, the parameter c described in the equations above is critical to the stability of the algo­ rithm, and very sensitive to small (1.2) errors. To compensate for this sen­ sitivity, we divide the gradient by the second (in the scalar case, or by some norm of the Hes­ sian for more complex prob­ lems). While use of the second de­ rivatives solves some of the sensitivity problem, it introduces a An algebraic solution to this simulta­ second problem: algebraic tedia. neous system of equations is intracta­ ble. Thus, we are forced to consider • Representational complexity To numerical methods. perform gradient descent, one must analytically evaluate first and sec­ ond partial derivatives. This alge­ 1.1Gradient Descent braic process, while straightfor­ ward, is tedious and prone to error. Fortunately, there are now a num­ Gradient descent, a standard nu­ ber of symbolic math software merical technique for optimization, packa, _3 which can be used to finds the minimum by stepping slowly automated this process. For the down the since the slope always many users who do not have easy points away from the minimum (or at access to such tools, the simple least, from some minimum). Given process of correctly differentiating some scalar function Eix), and a cur­ k complex expressions is tedious and rent estimate (at iteration kiJ · f i ) to time consuming at best. find the downhill point x(k+ ) gradier descent uses • n1~teaus Exponential functions 110. ve a notorious tendency to have plateaus in the corresponding MSE fit functions. For example, Figure 1 illustrates the error measure of Equation (1) plotted vs. c with a \ and b held at their optimal values. where E' and E" are the first and sec­ The slope of this curve becomes ar­ ond derivatives of E with respect x. bitrarily small as one moves to the left along- the c axis. If one choose a

2 stopping criterion for gradient de­ and may be described as follows: scent such as "stop ifthe magnitude of the gradient is less than T', it is 1. Choose (at random) an initial trivial to find such a point by mov­ value ofx. ing along the c axis away from the 2. Generate a point y which is a minimum. "neighbor" of x. • Local Minima 3. If E(y) < E(x), y becomes the new The primary flaw with gradient de­ value ofx. scent as a means of solving this type of minimization problem is 4. If E(y) > Et«), compute that (unless it gets caught on a pla­ Py=exp(-E(x)-E(y) IT). IfbPy > I! for R (a random num er um- teau), it finds the minimum ne: I. -st formly distributed between 0 and to the starting point, which mayor 1) then accept y. may not be the absolute minimum. 5. Decrease T slightly. 1.2 Simulated Annealing 6. IfT > Tmin' go to 2.

The minimization technique known as In step 3, we perform a descent, so simulated annealing [19,1] avoids the that we always fall "downhill". In step problems oflocal minima and plateaus 4, we make it possible to sometimes move uphill and get out of a valley. In­ itially, we ignore T and note that if y represents an uphill move, the prob­ ability of accepting y IS proportional to

e-(E(y) - E(x))IT . Thus, uphill moves can occur, but are exponentially less likely to occur as the size of the uphill move Lccomes larger. The likelihood of an uphill move is, however, strongly influenced hy T. If T is very large, all I moves will iJe accepted. As T is gradu­ I ally reduced, uphill moves become less likely until for low values of T, T«(E(y) - E(x)), such moves essen­ I tially can; !- occur.

I I I In the case of combinatorial optimiza­ tion, where all the variables take on only one of a small number (usually two) of possible values, the "neighbor" of a vector x 1 is another vector x 2 such ~ that only one element of x is changed - to create .x')" For problems (such as fit­ -0.5 -x- 0 0 -y- 500 ting an exponential) where the ~ar;i­ Figure 1. abIes take on continuous values, It IS Fit error vs. exponerrti.rl parameter c

3 Continuous Optimization (tree annealing) much more difficult to quantify the where the d-dimensional vector x has concept of "neighbor". continuously-valued elements. Fur­ In the next section, we describe a thermore, we ass~e a bounded new minimization strategy, which search space S c9\ , which we will handles continuously-valued van­ represent with a dynamic data struc­ abIes. ture. We use a k-d tree in which each 2 Continuous Optimization (tree level of the tree represents a binary partition of one particular degree of annealing) freeriom(DOF). Each node may thus be interpreted as representing a hyper­ Although some work has been done rectangle, and its «hildren therefore on extending SA to problems with con­ represent the smal. hyperrectangles tinuous variables[36], the nature of resulting from dividing the parent the SA algorithm makes it best suited along one particular DOF. In Figure 2, for solving problems in combinatorial we illustrate a one-dimensional energy optimization, in which the variables function, and show how a resulting take on only discrete values. This is partition tree provides more resolution primarily due to the difficulty in speci­ in the vicinity of minima. fying, for a particular problem, pre­ cisely what the "neighborhood" of a In Figure 3, we illustrate a 2-D exam- continuously-valued variable is. We will not ~ in this paper, attempt to sur­ vey all the other (that is, not based on SA) methods for continuous optimiza­ tion. See [8] for more information. In this section, we discuss a method for finding the minimum of functions of continuously-: ilued vari­ abIes. We find it convenie ;0 think of the optimization problem a search: the minimum lies somewhere in a bounded hyperspace of dimension d. It is not practical to use any sort of array structure to store a representation of such a space, since the storage rapidly Figllre 2. A one dimensional search becomes prohibitive. Instead, we use a space and corresponding partition tree It is ti .~. important to remember that the tree is built The minimization method de­ using a random process therefore, the tree is scribed here, which we call "tree an­ likely to possess more depth (resolution) in nealing" is an extension of the familiar the vicinity of minima. Therefore this figure Metropolis [23]algorithm of simulated shows only what the tree is likely to be. annealing, but handles continuously ple of the partition tree and how nodes valued variables in a natural way. in the tree correspond to successive We assume we are searching for the minimum of some function Hix) refinements. that g is symmetric in the following I=Ok=O sense: g(y I x) = g(x I y ). 2. Accept the point y as the new esti­ mate with.probability

R min (1 P(Y)) (2.1) 1=2 k=O 'p(x) ,

where the probabilities are Gibbs, (i.e., with form

p().) ex: exp (_H~»). Consider,

for a moment, the form of Equation (1). Figure 3. A two dimensonal example In it, we point out that the deci­ sion to accept or reject y is based on a probability. A moment's reflection In general, for ad-dimensional ,-11 convince the reader that this is problem, equivalent to steps 3 and 4 of the a node at level l will divide the kth simulated annealing algorithm, degree of freedom in half, Omitting for the moment discus­ sion of annealing, that is, reducing T, where k = I mod d The two chil­ we consider how the Simulated An­ dren of each node represent the nealing strategy is modified by this subspace resulting from dividing the "tree annealing" (TA) algorithm. parent subtree in half along the k111 2.1Growing and Searching the Tree sample, in a 6 DOF problem, suppose ~ ~ the domain of DOF 4 is 0 x 4 1. While the tree represents the ~_ tl­ Then a node at level 22 represents a t~1·e search space, we build it with partition of DOF 4, a par- .n whose . _~ .. Jr resolution in the vicinity of lo­ width IS cal minima. To see how this works, let us assume the process has been run­ 2l2~/6J =0.0625. ning for a while, and the tree has al­ ready been partially built. At each node, two numbers have been stored, The mr-tropolis algorithm proceeds "t. and n , which represent how many as follow" R times in the past that an acceptable 1. Given a current estimate XES, point has been found in the left and generate a "neighboring" point y E right subtrees, respectively, of that S, with generation probability node. g(y Ix), It is necessary to assume

IThis description is sli-ttly different from the on- nresente' ~:.rlier, since we want to empha­ size the importance of the generation process

5 Continuous Optimization (tree annealing)

Following the basic Metropolis tance) and simply the history of the schema, we are given a vector x, which search itself-- areas that have been we refer to as "the current sample" searched are more likely to be and corresponding objective Htx), To searched again. We must compensate choose a candidate point: for the second phenomenon. Step 1 of the preceding procedure 1. Begin at the root and, at each node, for growing and searching the tree im­ choose either the left or right child ran- plies a g(y Ix) which does not actually n n domly with probability nL+~R or nL+RnR' depend on the current x. We will find this to be sufficient and henceforth respectively. Descend the tree to a leaf will write the probability of generating making left-right decisions in this way. a candidate y as simply g(y), regard­ 1. Upon reaching a leaf, generate the point y less of x. Note, however, that a g of at random (uniformly) from the subspace this form violates the symmetry defined by that leaf. g(: \=g(x) (unless it is uniform). The resulting Metropolis procedure will 3. Compare x and y ar nake an ac- not converge to the Gibbs distribution cept/reject decision on J" "s decision is in general unless this asymmetry is discussed in more detail ~. .he next sub­ accounted for. The following modifica­ section. Ify is accepted, replace x by y as tion of Eq. (1.1) c.,n be proven [5] to the current sample; else, if y is rejected, x converge to the Gibbs density: remains the current sample. 4. If y was accepted, split the current leaf Accept y with probabil- (containing y), and create two new daughter nodes, thus making more resolu­ tion available at this node if it is ever ex­ .( g(X)P(Y)) (2.2) plored again. ity. min 1,g(y)p(x) · 5" Ascend the tree from the current sample

to th.: root, updating "t. and nR at each node. g(y) is computed from the path of the descent down the tree by Equation 2.2 Accept/reject decisions The accept/reject decision of step 3 (2.3) is similar to the familiar one used in simulated annealing, but must be slightly modified to compensate for the fact that y was not drawn from the (2.3) where search space with uniform probability. That is, the history of the search im­ /\ al = a +b' l represents the poses a bias onto the search space. To / 1 see this, we consider the fact that the node visitr ,1 at level l, a represents l node values, "t. and nR depend on two ', or nR'~; .rrding to which direction factors, the occurrence of low energy as chosen at each l, Similarly, b rep­ Z values in leaves (the history of accep- resents the n of the direction not cho-

6 Performance sen. Finally, in Eq. (2.3) we divide by Vy.' the hypervolume of the leaf con­ taining y to ensure that g represents a ~- -~- proper probability density. In [5], we show that the Gibbs distribution is 'l -: stationary state of the algorithm wi \ the modification ofEquation 2. I 2.3 Anne«Ung schedule I

Following an original suggestion of Kirkpatrick et.al., [19] to cool more slowly when the specific heat is high,

dS we use T f- rT, where r =1 c,' / o-)(- 500 28.03 -y- 40.17 and d.S is a small positive constant. C; is a term easily related to the vari­ Fi?"ure 3 ance of the energy. This form for r is Results of fitting a damped exponential. Data is the one of a family of similar schedules dotted , Fit is the ;id line. which have been reported in the litera­ turejl.], and differs only in the power to which C is raised Our e;·T~ ienr in attempting to find v the pa. ~meters of the fit shown in Fig­ ure 3 are interesting. We repeat equa­ 3 Performance tion (1.1) here for reference:

In this section, we provide exam­ L[(Yi- (a + beCX;)]2 ples of application of this new method i for fitting data. This expression represents the 3.1 An asymptotic exponent.al squared error resulting from fitting a data set with an exponential which Figure 3 iflustrates ' the result of saturates at a non-zero . using tree annealing to find the pa- Straightforward attempts to produce a ram. ~ of an exponential fit by using linear system of equations by taking tree ling. fail, due to the additive constant a; and tho minimization is therefore unsolvable analytically. Applying simple gradient descent failed miserably, since the step size re­ quired for c is radically different from the appropririte step for a and b. ~ Tse of second den"f.T8. tives to determine Le correct step ... .:e required 1 fair

_-I.ffiount of tedious algeb. ,.J., but .1 re-

7 Performance sult in a descent algorithm which vided in Atkinson's paper, and the ex­ would work if started in the proper pression of Eq. 3.2, and again found place. We were eventually able to de­ the optimum immediately. ­ termine that the error function was ingly, we found a better fit at ex = 1.6 convex (had only one minimum), but than at the published 1.33. Then, as suffered from the plateau problem dis­ an added interesting experiment, we cussed in section 1.1, so that if we re-introduced the offset term, putting started the descent at a large «-.5) Eq. 3.2 in the general form ofEq. 3.1, negative value of c, our algorithm never moved, since all derivatives . ',If -cW ~ S = r(»P + wg)lO (3.3) \vere essentially zero. r This process required about two days of effort. We then typed the prob­ lem description into INTEROPT, (de­ and estimated all four parameters. scribed in the next section) and TA Figure 4 shows the results (compare found the solution within about 30 sec­ with Fig 8 in [2], except that we plot S onds. vs. W linearly rather than log­ log.) 3.2 Response curves Atkinson[2] has described the use­ fulness of equations of the general form V = r(JP- ~)1O -cp (3.1) I ,

.where r and a are parameters which , must often be determined by data fit­ ting. The expression presented [2, Eq. 17J relating subjective magnitude of taste to concentration is of this gen­ eral form, but with the offset term set Figure 4. to zero: Data from [2] and result of fitting w, c, and a using TA. S = mil lO-cw (3.2)

where a is normally == 1.33. Al­ .Interestingly, by abandoning the need though means-squared error fits of for linear methods, and allowing the equations of the form of Eq. 3.1 are Wo term, we find that the optimal esti­ not generally solvable by linear tech­ mate for a is now 1.39, much closer to niques, the more specific from of Eq. W the predicted 1.33, but that o is a 3.2 (non-negligible) -2.76. Our purpose is solvable using linear methods, sim­ here is not only to show the ease with ply by taking the log 10 of both sides. which such experiments may be per­ However, we did not use this strategy. formed with TA. Instead, simply typed in the data pro-

8 Interactive Optimization

4 Interactive Optimization 6 References

As we used TA, and as our col­ leagues began to use it to solve their 1. E.H.L. Aarts and P.J.M. van Laarhoven. problems, it rapidly became apparent Simulated Annealing: Theory and Appli­ that a user-friendly interface to the cations D. Reidel Publishing Company, optimization algorithm was needed. Dordrecht, Holland, 1987. Consequently, one of the authors of 2. Atkinson, William H., 1982, "A Gen­ this paper wrote a program called eral Equation for Sensory Magni­ INTEROPT, which provides this inter­ tude",Perception and Psychophysics, face. The user is asked to define vol. 31(1), pp. 26-40. his/her problem in a convenient, 3. V. Badami, N. Corby, N. Irwin, and K. natural-language manner. INTEROPT Overton, "Dsing range data for inspecting then takes the problem definition, printed wiring boards". In Proc. IEEE Int. writes a program, compiles that pro­ Conf. on Robotics and Automation, Phila­ gram, and performs the optimization. delphia, April, 1988. In most instances, INTEROPT is able to solve general-purpose problems, 4. H. Baker and T. Binford "A system for without needing any manual interven­ automated stereo mapping". In Proc. Im­ tion (such as parameter adjustments) age Understanding Workshop pp. 215­ at all. Most of the problems described 222 Sciences Applications, Inc. 1982. in the previous section, diverse as they 5. G. Bilbro "A Metropolis Procedure with are, were solved with the same soft­ Asymmetric Neighborhoods" Technical ware. An example INTEROPT dia­ report TR- Center for Communications logue is presented in Figure 5. and Signal Processing, North Carolina State University, Raleigh, 1990. 5 Conclusion 6. G. Bilbro, M. Steer, R. Trew, and S. Sk­ aggs. "Parameter extraction of microwave In this paper, we have introduced transistors using tree annealing", a new technique for solving nonlinear IEEE/Cornell Conf. Digest, August, optimization problems-- in particular, 1989. those problems which have multiple 7. C. Chow and T. Kanako "Automatic minima: Tree Annealing. TA cannot boundary detection of the left ventricle (at this time) handle more than about from cineangiograms" Computers in Bio­ 30 variables. Even with this restric­ medical Research, 5:388-409, 1972. tion, TA seems to be a very exciting method, since NO analysis needs to be 8. G. Dahlquist and A. Bjorck Numerical done to solve a large class of problems. Methods, Prentice-Hall, Englewood With INTEROPT the user simply Cliffs, NJ 1974. types in the function to be minimized, 9. R. Duda and D. Nitzan. "Low level proc­ and the domain of the val~ables, and essing of registered intensity and range the program does the rest. data." In Proc. 3rd In.t Joint Con]. Artifi­ ciaIIntelligence,1976.

9 References

10.0. Faugeras et. ale "Towards a flexible 16.R. Jarvis "A mobile robot for computer vision system", In Robot Vision A. Pugh, vision research". In Proc. Third Austra­ ed., pp. 129-142. IFS, UK 1982. lian Compo Sci. Conf., Australian Nat. Univ. Canberra, ACT, 1980. 11.0. Faugeras and M. Herbert "The repre­ sentation, recognition, and locating of 3­ 17.R. Jarvis, "A laser time-of-flight range D objects", International Journal of Ro­ scanner for robotic vision", TR TR-CS­ botics Research, 5(3), Fall, 1986. 81-10, Camp. Sci. dept., Australian Nat. 12.Greenlee, Mark W. and Magnus­ Univ. Canberra, 1981. sen, Svein, 1987, "Saturation of the 18.Kalos, M.H. and P. A. Whitlock, Monte Tilt Aftereffect", Vision Research, Carlo Methods, Volume 1: Basics, Wiley vol. 27, no. 6, pp. 1041-1043. and Sons, New York, 1986. 19.5. Kirkpatrick, C. Gelatt, and M. Vecchi, 13.W. Grimson From Images to Sur­ "Optimization by simulated annealing", faces: A Computational Study of the Science, 220(4598):671-680, May 1983. Human Early Vision System". MIT 20.R. Lewis and A. Johnston, "A scanning Press, Cambridge, 1982. laser rangefinder for a robotic vehicle. In 14.Y. Han, W. Snyder, and G. Bilbro, "De­ Proc 5th Int. Joint Conf Artificial Intelli­ termination of optimal pose using tree an­ gence 1977. nealing", In Proc. IEEE Int. Conf. on Ro­ 21.Magnussen, Svein and Johnson, Tore, botics and Automation, Cincinnati, May, "Temporal Aspects of Spatial Adaptation: 1990. A Study of the Tilt Aftereffect", Vision 15.M. Hollins, A. Goble, B. Whitsel, Research, vol. 26, no. 4, pp. 661­ and M. Tommerdahl, "Time course 672.1986. and action spectrum of vibrotactile 22.D. Marr and T. Poggio. "A computational adaptation", Somatosensory and theory of human stereo vision",. In Proc. Motor Research, In review Royal Society ofLondon. (204), 1979. 23.Metropolis, N., A. Rosenbluth, M. Ro­ senbluth, A. Teller and E. Teller, "Equa­ tion of State Calculations by Fast Com­ puting Machines," J. of Chern. Physics, 21 (1953) 1087-1092.} 24.U. Mishra, A. Brown, and S. Rosenbaum, "DC and RF performance of 0.1u gate length AI.4SIn.S2As - Ga. 3SIn. 62As pseu­ domorphic hemts" IEDM Technical Di- gest December, 1988. 25.H. Nishihara "Prism: A practical real­ time imaging stereo system". In Proc. of 3rd Int. conf. on Robot Vision and Sen­ sory Control. R. Brooks, ed. North­ Holland, Amsterdam. 1983.

10 References

26.D. Nitzan, A. Brain, and R. Duda,"The 36. D. Vanderbilt and G. Louie, itA Monte measurement and use of registered reflec­ Carlo Simulated Annealing Approach to tance and range data in scene analysis." Optimization over Continuous Vari­ Poe. IEEE 65,:206-220, Feb. 1977. ables", J Comput. Phys. 36(1984)259­ 271. 27.Y. Ohta and T. Kanade, "Stereo by intra­ - - and interscanline search using dynamic 37.Virsu, Veijo and Lee, Barry B., "Light programming" TR CMU-CS-83-162, Adaptation in Cells of Macaque Lateral Carnegie Mellon U. Pittsburgh, 1983. Geniculate Nucleus and Its Relation to. Human Light Adaptation", Journal of 28.M. Oshima and Y. Shirai "Object recog­ Neurophysiology, vol. 50, no. 4. 1983 nition using 3-D information", IEEE PAMI, PAMI-5(4):353-361, 1983. 29.R. Paul. Robotic Manipulators, Mathe­ matics, Programming and Control MIT Press, Cambridge, Mass. 1981.

30.M. Peleg and O. Campanella, "On the Mathematical form of psychophysical relationships with special focus on the perception of mechanical properties of solid ob­ jects", Perception and Psychophysics, vol 44, no 5, 1988. 31. R. Scobey and C. Johnson, "Psyophysical properties of displace­ ment thresholds for moving tar­ gets", Acta Psychologica vol 48, 1981

32.W. Snyder Industrial Robots, Computer Interfacing and Control" Prentice-Hall, 1984. 33. W. Snyder, G. Bilbro, A. Logenthiran, and S. Rajala "Optimal Thresholding-- a New Approach" Pattern Recognition Let- ters In review. 34.B. Spain Analytic Quadrics. Pergamon Press, 1960. 35. Y. Tsuchiya, E. Inuzuka, Y. Suzui, and W. Yu. "Ultrafast streak camera", In Proc. 13th Int. Congr. High Speed Photo­ graph and Photonics", Tokyo, Japan, Au- gust 1978.

11 References

%interopt Do you want to optimize a function?no Well then, would you like to fit some data?yes How many parameters does the fit have?3 How many variables does the fit have?l How many outputs does the fit have?l Data file name?demos/datal How many data points are in that file?9 Now I need to know the parameters over which I should minimize What is the name of parameter number O?a What is the name of parameter number l?b What is the name of parameter number 2?c

Now, I need to know the names of the variables

DONT USE y! What is the name of variable number 0 x Enter the function to be fit use the form y[i] = f(x), with a different y[i] for each output y[O] = a + b * exp ( c * -x) I'm now compiling your function. Please wait I'm now compiling the main program. Please wait OK. We got that far. Now,_ tell me the max and min values on each variable What is the lower of a?20 What is the upper limit of a?50 What is the hat is the lower limit of b?-20 What is the upper limit of b?20 hat is the lower limit of c?-l What is the upper limit of c?O I am now running your optimization:

Figure 5 Example application of INTEROPT. A dialogue in which the user defines a problem of fitting the sum of two waves.

12