On the Efficient Identification of an Inflection Point
Total Page:16
File Type:pdf, Size:1020Kb
INTERNATIONAL JOURNAL OF MATHEMATICS AND SCIENTIFIC COMPUTING (ISSN: 2231-5330), VOL. 6, NO. 1, 2016 13 On the Efficient Identification of an Inflection Point Demetris T. Christopoulos Abstract—Our task is to find time efficient and statistically Our main task is to evaluate a total of six methods, in order consistent estimators for revealing the true inflection point of a to find the most efficient one and use it to estimate the mode planar curve when we have only a probably noisy set of points for when we have massive data sets. Paper structure: 1. introduc- it, thus we are introducing extremum surface (ESE) and distance estimator (EDE) methods. The analysis is based on the geometric tion, 2. presentation of ESE & EDE methods, 3. generalisation properties of the inflection point for a smooth function. Iterative to their iterative versions, 4. evaluation and application limits, versions of the methods are also given and tested. Numerical 5. an example of a real data set, 6. conclusion – proofs are experiments are performed for the class of sigmoid curves and given as Appendix. comparison with other available procedures is carried out. It is proven that both methods are quite fast in computational execution. Under a rather common noise type EDE can give II. EXTREMUM SURFACE AND DISTANCE ESTIMATOR a 96% confidence interval, while it always provides estimations METHODS for data with more than a million cases at a negligible execution A. Required concepts time. An alternative way of mode computation for a distribution by using its CDF is given as a real massive data example. Let a function f :[a, b] → R, f ∈ C(n), n ≥ 2 which is Index Terms—Inflection point identification, Efficient convex for x ∈ [a, p] and concave for x ∈ [p, b], p is the unique computation, Mode estimation, Iterarive methods. inflection point of f in [a, b] and let an arbitrary x ∈ [a, b]. Definition 2.1: Total, left and right chord are the lines MSC 2010 Codes – 62F12, 65D17, 65D99 connecting points {(a, f(a)), (b, f(b))}, {(a, f(a)), (x, f(x))} and {(x, f(x)), (b, f(b))} with Cartesian equations g(x), l(x) I. INTRODUCTION and r(x) respectively. Distances from chords are the functions F, Fl,Fr :[a, b] → R with The sigmoid or S-shaped model is common in many dis- F (x) = f(x)−g(x),Fl(x) = f(x)−l(x),Fr(x) = f(x)−r(x) ciplines like artificial neural networks [10], utility theory [7] (1) & technological substitution [12], [14] in Economics, growth By using elementary Analytical Geometry we can prove that theory [2] & allosterism [1] in Biology, population dynam- b f (a) − a f (b) f (b) − f (a) ics [20], [18] in Ecology, autocatalysis [19] in Analytical F (x) = f (x) − − x (2) Chemistry, dose-response [13] in Medicine and many others. b − a b − a f (a) − f (x)) A key concept for every sigmoid is the inflection point and Fl(t) = f (t) − f (a) + (t − a) , t ∈ [a, x] (3) its accurate and fast computation while the available methods x − a can be classified to (i) adoption of a suitable process - test f (b) − f (x) Fr(t) = f (t) − f (x) + (t − x) , t ∈ [x, b] (4) function - then use of regression or maximum likelihood b − x estimation techniques, (ii) smoothing data under the restriction that second order divided differences change sign once [5], Definition 2.2: The s-left (sl(a, x)) and s-right (sr(b, x)) (iii) use of Differential Geometry to define a proper measure are the algebraic surfaces of discrete curvature and then searching for the least measure x b s (a, x) = F (t)dt , s (b, x) = F (t)dt (5) point [16], [9], Gaussian smoothing techniques [15] and shape l a l r x r R R analysis procedures [11]. The x-left (xl) and x-right (xr) are x–values such that At this paper we are presenting two geometric methods xl = argmin {sl (a, x)} , xr = argmax {sr (b, x)} in order to find the inflection point of a curve, when we x∈[a,b+δ1] x∈[a−δ2,b] know only a set of points {(xi, yi) , i = 0, 1, . , n} for it, (6) the Extremum Surface Estimator (ESE method) which works with δ1, δ2 > 0 taken as small as necessary for xl, xr to be with planar surfaces and the Extremum Distance Estimator unique unconstrained extremes in the corresponding intervals. (EDE method) which uses only planar distances. We do not A graphical illustration of the above defined left and right- perform neither regression nor splines representation analysis. terms is presented at Fig. 1 and 2 where we observe that In addition, no concepts like discrete or digital curvature are when x = xl, x = xr then we achieve the algebraic minimum defined or used. Instead we focus on finding each time two sl(a, x) and maximum sr(b, x), respectively. The motivation proper points where the true inflection point lies between and for the left- and right- naming was due to the origin of the then we take as an estimator their middle point, just like chords: left/right starts from the left/right edge of the graph. bisection method works in root finding. Definition 2.3: Normal algebraic distance of the curve point (x, f(x)) from the total chord Def. 2.1 is function N :[a, b] → Demetris T. Christopoulos is with the Department of Economics, R with National and Kapodistrian University of Athens, Greece. (E-mail: (f(b)−f(a))x−(b−a)f(x)+b f(a)−a f(b) N(x) = (7) [email protected]) − √(f(b)−f(a))2+(b−a)2 INTERNATIONAL JOURNAL OF MATHEMATICS AND SCIENTIFIC COMPUTING (ISSN: 2231-5330), VOL. 6, NO. 1, 2016 14 mean, for example the uniform U(−r, r) or normal N(0, σ2). If error is distributed without a zero mean, then results are ambiguous. Definition 2.4: For the noisy data Eq. 9 we define the discrete distances from total, left and right chord as the values Yi = yi−g(xi),Yli = yi−l(xi),Yri = yi−r(xi), i = 0, . , n (10) Definition 2.5: Function f, around its inflection point p, is (a) x <p (b) x = p called symmetric when f (p + x) + f (p − x) − 2 f (p) = 0, ∀x ∈ R (11) and is called locally (ǫ, δ) asymptotically symmetric when: |f(p + x) + f(p − x) − 2f(p)| < ǫ, ∀x ∈ (p−δ, p+δ) (12) Definition 2.6: The function f :[a, b] → R, with respect to its inflection point p, has 1) Data symmetry if p − b = a − p (c) x = xl (d) x >xl 2) Data left asymmetry if p − b < a − p Fig. 1: Illustration of the algebraic surfaces sl(a, x) < 0 and 3) Data right asymmetry if p − b > a − p x . l The f :[a, b] → R is characterised as totally symmetric, if it is symmetric around p and has also data symmetry. Definition 2.7: For every subsequent xi < xj the elemen- tary trapezoidal estimation is xj f(xi) + f(xj ) f(x)dx ≈ Ti,j (f, xi, xj ) = (xj − xi) Zxi 2 (13) And for every standard partition the total trapezoidal estima- tion is b n−1 (a) x >p (b) x = p f(x)dx ≈ Tn+1(f, a, b) = Ti,i+1(f, xi, xi+1) (14) Z a Xi=0 B. The Extremum Surface Method We can prove that Lemma 2.1: The x-left (xl), x-right (xr) are abscissae such that left, right chord respectively are tangent to the graph G(f). Corollary 2.1: For Eq. 6 it holds ′ f(x)−f(a) (c) x = xr (d) x <xr xl = arg x f (x) = x−a ∈ x [a,b+δ1] n o (15) Fig. 2: Illustration of the algebraic surfaces sr(b, x) > 0 and ′ f(b)−f(x) xr = arg x f (x) = − xr. b x x∈[a−δ2,b] n o with δ1, δ2 > 0 taken as small as necessary for xl, xr to be For a convex/concave curve the above definition gives N(x) < unique unconstrained solutions in the corresponding intervals. 0 when x < p and N(x) > 0 if x > p. For a not necessarily This tangency condition is illustrated at Fig. 1(c) and 2(c). strictly sorted grid of n + 1 not equally spaced points {xi} of Corollary 2.2: Let a function f :[a, b] → R, f ∈ Eq. 8 –standard partition– we add errors to the values of a C(n), n ≥ 2 which is convex for x ∈ [a, p] and concave for known function f and create the noisy data set {(xi, yi)} by x ∈ [p, b]. Then we have one of the following possibilities: the process of Eq. 9. 1) If xl, xr ∈ [a, b] then a ≤ xr < xl ≤ b 2) If xl ∈/ [a, b] then xl > b a = x0 ≤ x1 ≤ ... ≤ xn = b (8) 2 3) If xr ∈/ [a, b] then xr < a yi = f(xi) + ǫi, ǫi ∼ iid(0, σ ) (9) We define the next theoretical estimator of the inflection point: Our analysis is applicable for every distribution with zero INTERNATIONAL JOURNAL OF MATHEMATICS AND SCIENTIFIC COMPUTING (ISSN: 2231-5330), VOL. 6, NO. 1, 2016 15 Definition 2.8: The theoretical extremum surface estimator (TESE) is xl+xr 2 , xl, xr ∈ [a, b] b+xr xS = 2 , xl > b (16) xl+a 2 , xr < a Lemma 2.2: If the mesh λ(n) of the standard partition is 2 such that lim nλ(n) = 0 then Tn+1(y, a, b) is a consistent n→∞ estimator of the Tn+1(f, a, b).