How Svms Can Estimate Quantiles and the Median

How Svms Can Estimate Quantiles and the Median

How SVMs can estimate quantiles and the median Ingo Steinwart Andreas Christmann Information Sciences Group CCS-3 Department of Mathematics Los Alamos National Laboratory Vrije Universiteit Brussel Los Alamos, NM 87545, USA B-1050 Brussels, Belgium [email protected] [email protected] Abstract We investigate quantile regression based on the pinball loss and the ǫ-insensitive loss. For the pinball loss a condition on the data-generating distribution P is given that ensures that the conditional quantiles are approximated with respect to 1. This result is then used to derive an oracle inequality for an SVM based kon · thek pinball loss. Moreover, we show that SVMs based on the ǫ-insensitive loss estimate the conditional median only under certain conditions on P . 1 Introduction Let P be a distribution on X Y , where X is an arbitrary set and Y R is closed. The goal of quantile regression is to estimate× the conditional quantile, i.e., the set valued⊂ function F ∗ (x) := t R :P ( , t] x τ and P [t, ) x 1 τ , x X, τ,P ∈ −∞ | ≥ ∞ | ≥ − ∈ where τ (0, 1) is a fixed constant and P( x), x X, is the (regular) conditional probability. For conceptual∈ simplicity (though mathematically· | this is∈ not necessary) we assume throughout this paper that F ∗ (x) consists of singletons, i.e., there exists a function f ∗ : X R, called the conditional τ,P τ,P → τ-quantile function, such that F ∗ (x) = f ∗ (x) , x X. Let us now consider the so-called τ,P { τ,P } ∈ τ-pinball loss L : R R [0, ) defined by L (y, t) := ψ (y t), where ψ (r) = (τ 1)r, if τ × → ∞ τ τ − τ − r < 0, and ψτ (r) = τr, if r 0. Moreover, given a (measurable) function f : X R we define the L -risk of f by (f) :=≥ E L (y, f(x)). Now recall that f ∗ is up to→ zero sets the only τ RLτ ,P (x,y)∼P τ τ,P function that minimizes the -risk, i.e. ∗ ∗ , where the infi- Lτ Lτ ,P(fτ,P) = inf Lτ ,P(f) =: Lτ ,P mum is taken over all f : X R. Based onR this observation severalR estimatorsR minimizing a (mod- → ified) empirical Lτ -risk were proposed (see [5] for a survey on both parametric and non-parametric methods) for situations where P is unknown, but i.i.d. samples D := ((x1, y1),..., (xn, yn)) drawn from P are given. In particular, [6, 4, 10] proposed an SVM that finds a solution fD,λ H of n ∈ 2 1 arg min λ f H + Lτ (yi, f(xi)) , (1) f∈H k k n i=1 X where λ > 0 is a regularization parameter and H is a reproducing kernel Hilbert space (RKHS) over X. Note that this optimization problem can be solved by considering the dual problem [4, 10], but since this technique is nowadays standard in machine learning we omit the details. Moreover, [10] contains an exhaustive empirical study as well some theoretical considerations. Empirical methods estimating quantiles with the help of the pinball loss typically obtain functions for which is close to ∗ with high probability. However, in general this only fD Lτ ,P(fD) Lτ ,P R ∗ R implies that fD is close to fτ,P in a very weak sense (see [7, Remark 3.18]), and hence there is so far only little justification for using fD as an estimate of the quantile function. Our goal is to address this issue by showing that under certain realistic assumptions on P we have an inequality of the form ∗ ∗ f f cP L ,P(f) . (2) k − τ,PkL1(PX ) ≤ R τ − RLτ ,P q We then use this inequality to establish an oracle inequality for SVMs defined by (1). In addition, we illustrate how this oracle inequality can be used to obtain learning rates and to justify a data- dependent method for finding the hyper-parameter λ and H. Finally, we generalize the methods for establishing (2) to investigate the role of ǫ in the ǫ-insensitive loss used in standard SVM regression. 2 Main results In the following X is an arbitrary, non-empty set equipped with a σ-algebra, and Y R is a closed non-empty set. Given a distribution P on X Y we further assume throughout this⊂ paper that the × σ-algebra on X is complete with respect to the marginal distribution PX of P, i.e., every subset of a PX -zero set is contained in the σ-algebra. Since the latter can always be ensured by increasing the original σ-algebra in a suitable manner we note that this is not a restriction at all. Definition 2.1 A distribution Q on R is said to have a τ-quantile of type α > 0 if there exists a τ-quantile t∗ R and a constant c > 0 such that for all s [0, α] we have ∈ Q ∈ Q (t∗, t∗ + s) c s and Q (t∗ s, t∗) c s . (3) ≥ Q − ≥ Q It is not difficult to see that a distribution Q having a τ-quantile of some type α has a unique τ- ∗ quantile t . Moreover, if Q has a Lebesgue density hQ then Q has a τ-quantile of type α if hQ is ∗ ∗ ∗ ∗ bounded away from zero on [t α, t +α] since we can use cQ := inf hQ(t): t [t α, t +α] in (3). This assumption is general− enough to cover many distributions{ used in parametric∈ − statistics} such as Gaussian, Student’s t, and logistic distributions (with Y = R), Gamma and log-normal distributions (with Y = [0, )), and uniform and Beta distributions (with Y = [0, 1]). ∞ The following definition describes distributions on X Y whose conditional distributions P( x), x X, have the same τ-quantile type α. × · | ∈ Definition 2.2 Let p (0, ], τ (0, 1), and α > 0. A distribution P on X Y is said to have a ∈ ∞ ∈ × τ-quantile of p-average type α, if Qx := P( x) has PX -almost surely a τ-quantile type α and b : X (0, ) defined by b(x):=c , where· | c is the constant in (3), satisfies b−1 L (P ). → ∞ P( · |x) P( · |x) ∈ p X Let us now give some examples for distributions having τ-quantiles of p-average type α. Example 2.3 Let P be a distribution on X R with marginal distribution PX and regular condi- ×−z tional probability Qx ( , y] := 1/(1+e ), y R, where z := y m(x) /σ(x), m : X R describes a location shift,−∞ and σ : X [β, 1/β] describes∈ a scale modification− for some constant→ → β (0, 1]. Let us further assume that the functions m and σ are measurable. Thus Qx is a logistic ∈ −z −z 2 R distribution having the positive and bounded Lebesgue density hQx (y) = e /(1 + e ) , y . The -quantile function is ∗ ∗ τ , , and we can choose∈ τ t (x) := fτ,Qx = m(x) + σ(x) log( 1−τ ) x X ∗ ∗ ∈ b(x) = inf hQx (t): t [t (x) α, t (x) + α] . Note that hQx (m(x) + y) = hQx (m(x) y) for all y R,{ and h (y)∈is strictly− decreasing for}y [m(x), ). Some calculations show − ∈ Qx ∈ ∞ ∗ ∗ u1(x) u2(x) 1 b(x) = min hQx (t (x) α), hQx (t (x)+α) = min 2 , 2 cα,β , , − (1+) u1(x) (1+) u2(x) ∈ 4 1−τ −α/σ(x) 1−τ α/σ(x) n o where u1(x) := τ e , u2(x) := τ e and cα,β > 0 can be chosen independent of x, because σ(x) [β, 1/β]. Hence b−1 L (P ) and P has a τ-quantile of -average type α. ∈ ∈ ∞ X ∞ Example 2.4 Let P˜ be a distribution on X Y with marginal distribution P˜ and regular con- × X ditional probability Q˜ x := P(˜ x) on Y . Furthermore, assume that Q˜ x is P˜X -almost surely of τ-quantile type α. Let us now· | consider the family of distributions P with marginal distribu- tion P˜ and regular conditional distributions Q := P˜ ( m(x))/σ(x) x , x X, where X x · − ∈ m : X R and σ : X (β, 1/β) are as in the previous example. Then Qx has a τ-quantile ∗ → ∗ → fτ,Q = m(x) + σ(x)f ˜ of type αβ, because we obtain for s [0, αβ] the inequality x τ,Qx ∈ ∗ ∗ ˜ ∗ ∗ Qx (fτ,Q , fτ,Q + s) = Qx (f ˜ , f ˜ + s/σ(x)) b(x)s/σ(x) b(x)βs . x x τ,Qx τ,Qx ≥ ≥ Consequently, P has a τ-quantile of p-average type αβ if and only if P˜ does have a τ-quantile of p-average type α. The following theorem shows that for distributions having a quantile of p-average type the condi- tional quantile can be estimated by functions that approximately minimize the pinball risk. Theorem 2.5 Let p (0, ], τ (0, 1), α > 0 be real numbers, and q := p . Moreover, let ∈ ∞ ∈ p1+ P be a distribution on X Y that has a τ-quantile of p-average type α. Then for all f : X R × − p+2 2p → satisfying (f) ∗ 2 1p+ α 1p+ we have RLτ ,P − RLτ ,P ≤ ∗ √ −1 1/2 ∗ f f L (P ) 2 b Lτ ,P(f) . k − τ,Pk q X ≤ k kLp(PX ) R − RLτ ,P q Our next goal is to establish an oracle inequality for SVMs defined by (1). To this end let us assume Y = [ 1, 1]. Then we have Lτ (y, t¯) Lτ (y, t) for all y Y , t R, where t¯ denotes t clipped to the interval− [ 1, 1], i.e., t¯ := max ≤1, min 1, t .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us