<<

Analytic Geometry, , Kernels, RKHS, and Native Spaces John Erickson, IIT

“There are two things you don’t want to see get made: Sausage and ” -John Erickson (I might have heard this somewhere however.)

Note to the Reader ◼ This is the latest in a series of Mathematica notebooks that some people have taken to calling “The Erickson Notebooks”. By some people, I am referring to myself of course. I have used them in various forms in the Meshfree/Kernel research group here at IIT for the past two summers. As a result, I occasionally say things like, “as we discussed...”, “recall that...”. and end up referencing a concept that appears no where else in the current document. I am trying to remove all such references or replace them with something useful. ◼ If you find the material too elementary or the tone too pedantic at times, try not to judge too harshly that the author is an idiot. I would suggest waiting until the end to revisit this question and then form an opinion. Keep in mind that this was written partly to answer the question: “Can I teach this stuff to a very talented high school student?” and also inspired by the spirit of Arnold Ross who taught us to “think deeply about simple things.” ◼ We urge the reader to play with the Manipulates in order to gain a feel for the subject. We understand that not everyone is in love with Mathematica but the pdf version is but a mere shadow of the cdf version. While poetic and apparently hyperbolic (no pun intended), we mean this quite literally: the pdf is essentially the cdf stuck at the particular moment of time I happened to save it. ◼ Some severe limitations should be confessed to immediately: we will usually confine the discussion to positive definite kernels on a subset of the real line. This is mainly to keep the ideas, programs, and graphics as simple as possible. ◼ I always have enjoyed reading articles where the author freely gave his opinions about things so I have done so. Take them with a grain of salt however and think for yourself. Finally, at times you might find yourself chuckling a bit. That’s because I’m a pretty funny guy I think.

Introduction An alternative title for this talk could have been “What are native spaces and what the heck is the native space norm measuring?” I used the above title because I think the best way to answer this question is to take a tour through a number of topics in mathematics ranging from the elementary to the more advanced. 2 analytic_geom_kernels_native_spaces.nb

We will endeavor to provide some insight into what exactly the native space norm of a positive definite kernel is measuring while staying as elementary as possible. The native space norm is clearly designed to be a means of measuring the size of functions which are linear combinations of our given kernel evaluated at points in our space. The norm has the benefit that it comes from an inner product but it isn’t exactly clear what is being measured. In order to make things easier we will confine ourselves to dimension one and primarily confine ourselves to the native spaces determined by positive definite kernels, but we will mention a few conditionally positive definite kernels since they provide nice examples. Before proceeding however, let’s ask a question.

A Question From Analytic Geometry We all know from high school that if you take the unit circle centered at the origin and apply a non singular matrix A to it, you get an ellipse. In high school you learn this in a different form as the paramet- ric equations of an ellipse. We also know that every ellipse centered at the origin is essentially the solution to a quadratic equation in two variables which can also be defined by a matrix B. In high school you learn this when you learn about conic sections, and it involved a fair amount of trigonometry since you may not have known enough about matrices at the time. So the question is the following: precisely formulate the above statement and find the relationship between the two matrices A and B. If you don’t know the answer and find it interesting then think about it for a moment. Remark: I actually did learn analytic geometry, but not this question, in high school but my school had a strong math program. Unfortunately, since geometry appears to have been deprecated in the modern American high school curriculum (presumably so that students feel good about themselves), I find that many students have never had a full course in analytic geometry which is a real pity since it therefore makes teaching them multivariable calculus and linear algebra that much harder. Of course, this is my fault for “making it so confusing.”

Some Linear Algebra We don’t intend to Review all of linear algebra but simply state some results that should be familiar. You might even know these facts without realizing you know them. Incidentally, we will adopt the convention that vectors can be identified with column matrices. You should have learned that in finite dimensions, after choosing a basis... ◼ Every Linear transformation L:ℝ n → ℝm can be realized as a matrix L(u)=Au. ◼ In particular, with m= 1, every linear functional L:ℝ n → ℝ can be realized as a row matrix which can in turn be identified as a vector. In infinite dimensions this becomes one of the Riesz representation theorems for bounded linear functionals. It doesn’t seem reasonable that you can understand the latter result if you never realized it in finite dimensions. By the way, the set of all linear functionals on a finite dimensional is also called the . It is also a vector space and in fact it is essentially the same (isomorphic) to the original space due to the aforementioned correspondence. ◼ Every bilinear functional b:ℝ n ×ℝ m → ℝ can be realized as the matrix product b(u,v)=u T B v. analytic_geom_kernels_native_spaces.nb 3

◼ In particular every (real) inner product b:ℝ n ×ℝ n → ℝ can be realized as b(u,v)=u T B v where the matrix B is both symmetric and positive definite (in addition to being bilinear already). Inner products are a very special type of kernel. In fact, in some sense they are canonical type of kernel. We will denote inner product by using the “bra-ket” notation 〈u,v〉=u T B v. ◼ In a real , or pre-, we can define a norm on the space by taking the inner product of a vector with itself, i.e., v2 =〈v,v〉=v T B v where B is again a symmetric positive definite matrix. It is clear (you should have seen) that an inner product will induce a norm in this fashion. Conversely if you are given a norm in a real inner product space derived from the inner product you can always recover the inner product using the following so-called “polarization identity” 〈u,v〉= 1 u2 +v 2 -u-v 2 2 This is, of course, nothing more than the law of cosines from high school trigonometry since 〈u,v〉=uv cos(θ). There is a similar result for complex inner product spaces. ◼ You should know about Eigenvalues and Eigenvectors. ◼ You should know the spectral theorem that every symmetric operator on a space of dimension n has n real Eigenvalues (when counted with multiplicity) and an orthogonal basis of eigenvectors. In terms of matrices, you can diagonalize a symmetric matrix and you get reals on the diagonal. Moreover the matrix effecting the change of basis is the just the transpose of the Eigenbasis. This is an example of a matrix factorization: A=PΛP -1 where P is the change of basis matrix. ◼ It would be nice if you were acquainted with some additional matrix factorizations: The Jordan canonical form, L U, Q R, Cholesky, and the SVD. We probably won’t use the Jordan form, but you really should know it. ◼ Finally, you should realize that the set of all bilinear functionals on a pair of vector spaces is isomorphic to the dual space of the product of the original two spaces. More precisely, Bilinear(ℝn ×ℝ m ,ℝ)≃ Linear(ℝ n ⊗ℝm ,ℝ) This follows “trivially” from the universal mapping property of tensor products. I’m kidding of course, while also adopting the standard pompous tone in mathematics that we all learn so as to sound smarter. You don’t really need to know this and I just threw it in partly to be erudite but it is actually relevant. This result reduces the above claim about the representation of bilinear functionals (which you do need to know) to the previous claim about the representation of linear functionals which you do know. If I have piqued your interest, I’ll say a bit more. The left and right sides are both vectors spaces of dimension m n (that’s one of the properties of tensor products) therefore they must be isomorphic since dimension is an isomorphism invariant in the category of vector spaces if you want to make a mountain out of a molehill. The result is actually interesting because there is a natural isomorphism between these two vector spaces. If you want to know more about this google “Multilinear Algebra” or, god forbid, go to the library and read a book about it.

Some Analytic Geometry As I mentioned at the beginning it isn’t clear that students have seen this subject, particularly if they were educated in the United States and they are young. While we won’t do a complete review of the subject we will touch on some of the high points. The question that might have occurred to you why I have I placed this section after the section on linear algebra? Simple: analytic geometry is much easier when you have linear algebra at your disposal. Otherwise it can involve lots of bare handed, painful trigonometric calculations that provide little insight. It is amazing that you can actually get all the main results with just trig however. 4 analytic_geom_kernels_native_spaces.nb

The Conic Sections In the classical Greek theory of conic sections these curves are defined by their name: you cut a double cone with a plane and look at the resulting curve. This is easy to do in Mathematica.

ManipulateContourPlot3Dz 2 ==x 2 +y 2, a x+by+cz⩵1,{x,-3, 3}, {y,-3, 3},{z,-3, 3}, Mesh→ False, Boxed→ False, Axes→ False, ContourStyle→{Opacity[.7]},{a,-1, 1},{b,-1, 1},{c,-1, 1}

a

b

c

Then using the ultra slick Greek mathematics of the “Dandelin Spheres” you can determine the foci of your conic section and ultimately “coordinatize” your conic section on the resulting plane. The result is that every conic section in the Euclidean (not projective) plane is given by a quadratic equation in two variables a x2 +bxy+cy 2 +dx+ey+f=0 and this is how your typical high school class defines the conic sections. One learns that there are three types (not counting degeneracies): ellipses, parabolas, and hyperbolas. One then usually learns some 2 y2 “canonical forms”, like x + = 1 for an ellipse, of each of these types of conic sections that are a2 b2 analytic_geom_kernels_native_spaces.nb 5

achieved by changing coordinate systems via translations and rotations. Roughly, you rotate to get rid of the xy term and then translate to get rid of one or both of the x and y terms depending on where you have a parabola or not. If the conic is a parabola you must have a linear term. Finally, if your course was thorough you learned that there is an algebraic invariant for helping to detect the type of the conic section called the discriminant: b2 -4ac. Once these ideas are internalized you can ask some questions to test your understanding. Assume we are talking about non degenerate conic sections below. If you apply a linear transformation to an ellipse can it become a hyperbola or parabola? If you apply a linear transformation to a parabola can it become hyperbola? One could probably do a horrendous amount of algebra (or, better have Mathematica do it) and answer these question via the discriminant but this is not necessary. Instead these questions are essentially topological. A linear transformation is a type of continuous map and the continuous image of a compact or connected set is still compact or connected. So, no, an ellipse, which is closed and bounded and therefore compact cannot get sent to an unbounded and therefore not compact hyperbola or parabola. Also, no, a parabola is connected so it can’t get broken into two pieces by a continuous map even though they are both non-compact. We illustrate some conics with a Manipulate. 6 analytic_geom_kernels_native_spaces.nb

ManipulateContourPlota x2 +bxy+cy 2 +dx+ey+f⩵ 0, {x,-3, 3},{y,-3, 3}, PlotLabel→ "b 2-4ac=" <> ToStringb 2 -4ac , {a,-1, 1},{b,-3, 3},{c,-1, 1},{d,-1, 1},{e,-1, 1},{f,-1, 1}

a

b

c

d

e

f

b2-4ac=5 3

2

1

0

-1

-2

-3 -3 -2 -1 0 1 2 3

Focus on Ellipses Now (no pun intended) So we have that an ellipse must get sent an ellipse. In particular we asserted that a circle must get sent to an ellipse by a non singular matrix. You know a special case of this from the parametric equations a0 for an ellipse x=a cos(t) andy=b sin(t) . This is simply the image of a circle under the matrix . 0b As we have said there is nothing special about the matrix required for this. We illustrate with a Manipu- late. analytic_geom_kernels_native_spaces.nb 7

Manipulate[GraphicsRow[{ParametricPlot[{Cos[θ], Sin[θ]},{θ, 0, 2π}, Epilog→{Red, PointSize[.02], Arrow[{{0, 0},{Cos[t], Sin[t]}}]}, ImageSize→ 200], ParametricPlot[{{a, b},{c, d}}.{Cos[θ], Sin[θ]}, {θ, 0, 2π}, PlotLabel→ "Det=" <> ToString[Det[{{a, b},{c, d}}]] , Epilog→ {Red, PointSize[.02], Arrow[{{0, 0}, {{a, b},{c, d}}.{Cos[t], Sin[t]}}]}]}], {t, 0, 2π},{a,-10, 10},{b,-10, 10},{c,-10, 10},{d,-10, 10}]

t

a

b

c

d

1.0 Det=-56.1172

0.5 5

-1.0 -0.5 0.5 1.0 -5 5

-0.5 -5

-1.0

Note that when the determinant is 0 you get a degenerate ellipse, when it is positive the red arrow moves in the same direction as the red arrow on the circle, but when the determinant is negative the linear transformation is orientation reversing and the image arrow goes in the opposite direction. Note that if you look closely at the Mathematica code, which I know so many of you enjoy doing, a b it contains the first matrix for the question I asked at the beginning. All ellipses centered d c at the origin can be thought of as a b cos(θ) E= θ ∈[0, 2π](1) d c sin(θ) Let’s come back to our algebraic definition of conics but let’s assume we have an ellipse a x2 +bxy+cy 2 +dx+ey+f=0 By translating we can rewrite this in a new set of coordinates where we can get rid of d and e a x2 +bxy+cy 2 = -f Note that we didn’t bother renaming our variable or coefficient. It turns that if you have a non degener- ate ellipse f≠ 0 so you may divide by it. In other words we really only have 3 degrees of freedom in 8 analytic_geom_kernels_native_spaces.nb

these coefficients and we get, by relabeling, a x2 +bxy+cy 2 =1 But this can be written as a matrix equation!

b a x E=(x,y) (x,y) 2 =1(2) b c y 2 This is all standard of course, but I find too many students who have never seen it. We can turn this form of a matrix equation for an ellipse into an illustrative Manipulate as follows: analytic_geom_kernels_native_spaces.nb 9

Manipulate

a b ShowContourPlot{x, y}. 2 .{x, y}⩵ 1,{x,-2, 2},{y,-2, 2}, AspectRatio→ 1, b c 2 Axes→ True, AxesLabel→{x, y}, PlotLabel→ "b 2-4ac=" <> ToStringb 2 -4ac,

a b ParametricPlots Eigenvectors 2 ,{s,-2, 2}, PlotStyle→ Red, b c 2 {a,-2, 2},{b,-2, 2},{c,-2 , 2}

a

b

c

b2-4ac=-9.4321 y

2

1

0 x

-1

-2 -2 -1 0 1 2

a b A few comments and observations are in order. First of all, 2 is the matrix that is the second b c 2 part of the question I asked at the beginning. Also, Since there are no linear terms we know the curve should be an ellipse or hyperbola. You should observe that when b2 -4ac< 0 that is when you get an ellipse, but not always, sometimes you get nothing. As a side benefit of this Manipulate, you 10 analytic_geom_kernels_native_spaces.nb

should look at the Mathematica code and notice that the two red lines are the major and minor axes of the ellipse and are therefore perpendicular. Why is this? All of these questions are related and they all a b come down to the spectral theorem. The matrix 2 is symmetric, therefore the spectral theorem b c 2 tell us that ◼ The Eigenvalues are real. Counting multiplicity there must be two of them, but they could have been complex. ◼ The Eigenvectors are orthogonal. One characterization of Eigenvalues involves the maximum and minimum from the origin which equates to the major and minor axes. If you examine the code above, the red lines are the associated one dimensional Eigenspace. ◼ Without going into details you should recall that the determinant is a “similarity” invariant of a matrix and in fact it is really the product of the Eigenvalues. Therefore the determinant is 2 2 2 a c- b = - b -4ac > 0 which implies the Eigenvalues have the same sign. Since a c> b we also have 4 4 4 that a and c must have the same sign. Now we introduce some language: A symmetric real matrix in which all of the Eigenvalues are positive is called positive definite. If the Eigenvalues are all negative, the matrix is called negative definite. Otherwise, the matrix is called “indefinite”. We made that last one up. In short all origin centered ellipses are the solutions to

b a x (x,y) 2 =1 b c y 2 where the matrix is positive definite. By the above discussion this comes down to checking that the determinant is positive and a> 0. We actually, address this same issue again in the discussion below by way of good review. Remark: You should have encountered these notions in your multivariable calculus course, but they may have been disguised a bit since you didn’t know linear algebra at the time. When you did Taylor’s theorem in two variables to expand the expression f(x+h,y+k) you saw the following where we include the Mathematica code in case you’re interested. We first generate sufficient terms of the Taylor series.

series= Normal[Series[#,{k, 0, 2}]]&/@(Series[f[x+ h, y+k],{h, 0, 2}] // Normal) 1 f[x, y] +kf (0,1)[x, y]+ k2 f(0,2)[x, y]+hf (1,0)[x, y]+hkf (1,1)[x, y]+ 2 1 1 1 1 h k2 f(1,2)[x, y]+ h2 f(2,0)[x, y]+ h2 k f(2,1)[x, y]+ h2 k2 f(2,2)[x, y] 2 2 2 4 Then we select out the quadratic terms and sum them. The code used in the above line is a nice applica- tion of the functional paradigm called “mapping”, and the code used below is a nice application of Mathematica’s patten recognition capabilities and is a really marvelous feature. Of course, Mathematica is at it’s core based on patterns and rules so this is no big deal to the initiated, “...and we are initiated, aren’t we...?”. I once told my class that I would give them extra credit if they knew what movie I just quoted from. They did not get any extra credit. analytic_geom_kernels_native_spaces.nb 11

Factor[Total[Cases[series,_ Derivative[p_, q_][f][x, y] /; p+q⩵2]]] 1 k2 f(0,2)[x, y]+2hkf (1,1)[x, y]+h 2 f(2,0)[x, y] 2 Now if you stare at this long enough, you will realize that the quadratic terms of the Taylor series are actually just

∂2f ∂2f 2 1 ∂x ∂y∂x h H(x,y) (f)(h,k)= ( h k ) 2 ∂2f ∂2f k ∂y∂x ∂y2

or, in short, the second derivative for a with domain in ℝ2. Clairaut’s theorem guarantees that “mixed partials commute” so this matrix must be symmetric. Therefore by the spectral theorem, this matrix is guaranteed to have real Eigen values and moreover the Eigenvectors are orthogonal. As a result, if you are at a critical point, you have a minimum when both eigenvalues are positive (positive definite) and maximum when they are both negative (negative definite) and a saddle point (no max/min) when the eigenvalues have opposite sign. When they teach this in a non linear algebra based multivari- able calculus class, they just tell you to make sure that determinant of the Hessian matrix to see if it is 2 positive so that the Eigen values have the same sign and then to check the sign of ∂ f without explain- ∂x2 ing anything. Amusing anecdote time. I was recently at a talk where the presenter kept pronouncing “the Hessian” as “the Haitian”. I didn’t want to be a jerk, but my duty to educate combined with the fact that I could take it no longer lead me to say the following: “While I am glad you are referring to this matrix after my moth- er’s homeland, the word is pronounced in American as H-e-s-s-i-a-n.” I didn’t think of saying “American” in real time unfortunately as that would have been comic gold. Incidentally, my father proudly claims the partial ancestry of a Hessian deserter. The Hessians, if you don’t know, were the German mercenaries hired by the British (among others) to help fight the Americans in the Revolutionary war. To be precise, this was before Bismark united Germany and apparently many of these soldiers were from “Hesse”. I just read that last part in Wikipedia. If you want to see what the Hessian’s looked like I suggest you rent the movie “Sleepy Hollow” starring Christopher Walken.

Back to the Main Question Asked at the Beginning Let’s return now to the question asked at the beginning. Hopefully you have jogged your memory while reading the above list and you have properly formulated the problem. Suppose you have been given an ellipse in terms of the image under a matrix A of a circle as in eq. (1) cos(θ) a b E=A θ ∈[0, 2π] whereA= sin(θ) d c What is the matrix B that gives us the same ellipse via eq. (2) q p x 2 E=(x,y) (x,y)B =1 whereB= q y r 2 cos(θ) x x 2 A = Therefore we have A -1  = 1. Now suppose A-1 =QR is the QR factorization of sin(θ) y y 2 12 analytic_geom_kernels_native_spaces.nb

x T x A-1. Therefore we have QR QR = 1. This gives us y y x x 1=(x y)R T QT QR = (x y)R T R . Now let B=R T R and we’re done since this matrix is both y y symmetric and positive definite. To summarize we have

B= QRdecompA-1T QRdecompA-1 Of course, we are talking about the R part of the QRdecompositon. This can be implemented as follows.

Manipulate[Show[ {ContourPlot[{x, y}.Transpose[QRDecomposition[Inverse[{{a, b},{c, d}}]][[2]]]. QRDecomposition[Inverse[{{a, b},{c, d}}]][[2]].{x, y}⩵ 1,{x,-10, 10}, {y,-10, 10}, PlotRange→ 10], ParametricPlot[{{a, b},{c, d}}.{Cos[θ], Sin[θ]}, {θ, 0, 2π}, PlotStyle→{Red, Dashed}, PlotRange→ All]}], {a,-3, 5},{b,-2, 5},{c,-1, 5},{d, 1, 5}]

a

b

c

d

10

5

0

-5

-10 -10 -5 0 5 10

Now we ask the question in reverse and suppose we have been given an ellipse as x E=(x,y) (x,y)B =1 y analytic_geom_kernels_native_spaces.nb 13

where B is a symmetric and positive definite matrix. Therefore B has a Cholesky factorization B=C T C x x 2 x T x so we have (x,y)C T C = 1 which give us C  = C C = 1. Therefore there exists θ y y 2 y y cos(θ) x cos(θ) x such that =C . Since C is invertible let A=C -1 and we have A = . To summa- sin(θ) y sin(θ) y rize we have A= Cholesky(B)-1 This formula is implemented in the Manipulate below.

b b ManipulateShowContourPlot{x, y}.a, , , c.{x, y}⩵ 1, 2 2 {x,-3, 3},{y,-3, 3}, Axes→ True, PlotRange→3, ParametricPlot b b InverseCholeskyDecompositiona, , , c.{Cos[θ], Sin[θ]}, 2 2 {θ, 0, 2π}, PlotStyle→{Red, Dashed}, PlotRange→2,

PlotLabel→ "b 2-4ac=" <> ToStringb 2 -4ac,{a, 1, 2},{b, 0, 2},{c, 1, 2}

a

b

c

b2-4ac=-2.5499 3

2

1

0

-1

-2

-3 -3 -2 -1 0 1 2 3 14 analytic_geom_kernels_native_spaces.nb

So we have solved the analytic geometry question we posed at the beginning. If the calculation is not convincing, the Manipulate should be. The answer is interesting in its own right since matrix factoriza- tions provide a simple and decisive answer, but this result will be useful later.

Kernels in General When you hear the word “kernel” in mathematics you need to know the context. The problem is that the word is overused. While it can mean the same thing as the notion of null space, in our context a general kernel is simply a function K of two variables from a spaced Cartesian “producted” with itself to ℝ or ℂ. K:Ω× Ω → ℝ orℂ

It isn’t exactly clear to me what hypotheses must be imposed on Ω. For example, if you intend to do integration as in the sub section that follows, you need Ω to be some kind of measure space. It usually just seems that Ω is a (most often open) subset of ℝd. In this document we will just have Ω ⊂ ℝ. In fact Ω will usually just be an interval or the whole real line.

Integral Type Operators One of the things you are most likely to do with a kernel is “integrate or sum against it” to form another function or vector of interest. For example given a kernel defined on [a,b] we can define an operator or transform that look like

b T f(x) = f(y)K(x,y)ⅆy ∫a The function K(x,y) is called the kernel function or sometimes the nucleus. In fact, kernels need not be defined on subsets of the domain ℝ×ℝ and in fact the kinds of kernels we are interested in will be defined on pairs of more elaborate sets but the first point is that they are defined on pairs of sets. A concrete example of an integral transform and a kernel that any student of ODE’s has encountered is the Laplace transform given by ∞ ℒf(p)= f(s)ⅇ -sp ⅆs ∫0 Here the kernel is the function K:ℝ×ℝ⟶ℝ given by K(p,s) =ⅇ -sp. The basic ideas behind a general kernel has even more elementary origins: matrix multiplication! When we define a linear transformation L:ℝ n ⟶ℝn by a matrix L(x) =Ax This is just shorthand for n n (L(x))i = ∑j=1 ai j xj = ∑j=1 xj ai j Which we could write as n L(x)(i)= ∑j=1 x(j)A(i,j) This looks a lot like our integral transform equation except now f is x and k is A. We have simply removed the shorthand of the subscripts to reveal the true functional nature of all these indexed expres- sions. After all a vector x in ℝn is really a function x:{1, 2, 3,…,n}⟶ℝ and a matrix is really just a function A:{1, 2, 3,…,n}×{1, 2, 3,…,n}⟶ℝ. We have also replaced the integral with a sum obvi- analytic_geom_kernels_native_spaces.nb 15

ously. The shorthand way to say all of this is to say that the “discretization” of an integral transform is just matrix multiplication. This may seem strange if you are not used to this way of thinking, but the truth is that part of the goal of modern mathematics is to show that things that seem superficially different are actually the same when looked at properly. From this point of view kernels are functions of two variables that you “integrate against” to turn a given function into another function (presumably one you are interested in or is more useful).

The Hypotheses on and the “real point” of our Kernels It is valuable to have as a wide perspective on what constitutes a kernel, but since we will use kernels to do interpolation, we will see later that it will be desirable to insist that our kernels are symmetric and positive definite, or conditionally positive definite, but for simplicity we will usually focus on the former. Symmetry just means K(x,y)=K(y,x). The definition of positive definite is a bit more complicated and is given below. Definition: A function K:Ω×Ω→ℝ is called positive definite if for all c∈ℝ n we have n n ∑j=1 ∑k=1 cj ck Kx j,x k ≥0. We will talk about the motivations for this later. Finally, we are not clear what we may conclude about a kernel when we impose the above two requirements, so just in case, we will also insist that K is continu- ous. There are additional properties we will often impose on our kernels that are not necessary for the definition but make for a simpler theory: radial and translational invariance. Our kernels are then of form K(x,y)=Φ(x-y) and this largely turns a multivariate problem into a univariate one in which Fourier transform methods can be applied. In this special case our kernels are called radial basis functions. Note that if we assume translation invariance we must be talking about globally defined kernels. After all this talk about hypotheses and properties it is now confession time. The truth is we will do what 2 2 many other people do and most of the time our kernel will be the Gaussian: K(x,y)=ⅇ -ϵ x-y (note that we have included a shape parameter). Finally, we we will even further restrict our kernels to so that 2 2 Ω ⊂ ℝ so K(x,y)=ⅇ -ϵ (x-y) so that there is no need for norms or even absolute values. This brings us to the real point of kernels for us, we’ve talked about what you can do with them and some hypotheses but we haven’t really said what the real point is, and, I don’t think this is said as often as it should be: K(x,y) describes “ a weight factor at x which in depends on x’s relationship to y”. This is easiest to understand for positive radial basis functions such that K(x,x) = 1 like the Gaussian. K(x,y) then sort of loosely represents how close x is to y with some possible “drop off factor”. If x=y you get 1, and if x << y you get something near 0. The less you care about far away points, the more you should localize your basis functions. Another way to think about it is that the basis functions are all talking to each other, and they “communicate” less if you localize them more. Incidentally, this corresponds to making the shape parameter high. Remark: I am saying “our” and “us” because many other important kernels exist for other purposes that we won’t mention. The Dirichlet kernel arises for example for understanding the summation of Fourier series. 16 analytic_geom_kernels_native_spaces.nb

localization[phi_,ϵ_]:= Module[{c, xvals}, xvals={-1, 0, 1}; Plot[(phi[ϵx,ϵ#]&/@ xvals),{x,-2, 2}, PlotRange→ {{-2, 2}, {-2, 2}}, AspectRatio→1]]

Manipulatelocalization[phi,ϵ], {{ϵ, 1}, .01, 20},

2 phi, Exp-(#1-#2) 2 &, "kernel",Exp-(#1-#2) 2&→ TraditionalForm"ⅇ -(x-y) ", Exp[- Abs[#1-#2]]&→ TraditionalForm"ⅇ - x-y ", 1 1 &→ TraditionalForm" " 1+(#1-#2) 2 1+(x-y) 2

ϵ

-x-y 2 - x-y 1 kernel ⅇ ⅇ 1+x-y 2

localizationExp-(#1-#2) 2&,1

Visualizing Kernels, Cross Sections, and Linear Combinations of same If you want to visualize a kernel the easiest way is to just do a 3D plot. Later we will be taking the “cross sections” of our functions to use them as the basis of a vector space of functions. A cross section is just K(a,x) for some fixed a and they are also easy to visualize. Finally, we will look at some random linear combinations of kernel cross sections.

Some Examples of Kernels Obviously we can’t graph infinite domains in the examples below, and we have included some exam- ples that can be extended to all of ℝ and some that are defined only on bounded domains.

2 Plot3Dⅇ-(x-y) ,{x,-1, 1},{y,-1, 1} analytic_geom_kernels_native_spaces.nb 17

Plot3Dⅇ-Abs[x-y],{x,-1, 1},{y,-1, 1}

Plot3D[Min[x, y],{x, 0, 1},{y, 0, 1}] 18 analytic_geom_kernels_native_spaces.nb

1 Plot3D (2 x Cos[y] Sin[x]- 2 y Cos[x] Sin[y]),{x, 0.01, 1},{y, 0.01, 1}// Quiet x2 -y 2

1 3 x y 5 Plot3D + + -1+3x 2 -1+3y 2+ 2 2 8 7 9 -3x+5x 3 -3y+5y 3+ 3- 30 x 2 + 35 x4 3- 30 y 2 + 35 y4+ 8 128 11 15 x- 70 x 3 + 63 x5 15 y- 70 y 3 + 63 y5,{x,-1, 1},{y,-1, 1} 128

What can we conclude about kernels from a visual inspection? First of all it is clear from visual inspection that the line y=x is a line of symmetry so our kernel appears to be symmetric, k(x,y)=k(y,x).

We also observe that we have positivity along the diagonal k(x,x)≥ 0. Note that the last kernel is not positive everywhere. analytic_geom_kernels_native_spaces.nb 19

For these kernels it appears that the maximum of the function also occurs along the diagonal.

These kernels are also all continuous on their domains. Note that for the fourth example we need to “fill” the removable discontinuity at the origin.

Cross Sections of Kernels The Manipulate below is crude, but gets the idea across. By the way, you might ask why after we said that we were focusing more really on radial basis functions and especially the Gaussian, we are talking about “cross sections” since we have already illustrated some rbf kernels in the manipulate above? Simple, and I am being rather pedantic, not all important kernels are radial! In the Manipulate below the first three kernels are three popular rbf kernels but the min kernel is not, not even a little bit. 20 analytic_geom_kernels_native_spaces.nb

Manipulate Show[ParametricPlot3D[{s, t, k[s, t]},{s,-1, 1},{t,-1, 1}, Mesh→ False, PlotStyle→ Opacity[.5], Boxed→ False, Axes→ False, BoxRatios→{1, 1, 1}, PerformanceGoal⧴ "Quality"], ParametricPlot3D[{s, c, k[s, c]},{s,-1, 1}, PlotStyle→{Thickness[.01], Red}, PerformanceGoal⧴ "Quality"]], {k, Abs[#1-#2]&},Abs[#1-#2]&→ TraditionalForm["abs(x-y)"], 2 Exp-(#1-#2) 2&→ TraditionalForm"ⅇ -(x-y) ", Exp[- Abs[#1-#2]]&→ TraditionalForm"ⅇ - x-y ", Min[#1,#2]&→ TraditionalForm["min(x,y)"],{c,-1, 1}

2 k abs(x-y) ⅇ-x-y ⅇ- x-y min(x,y)

c

Linear Combinations of Our Cross Section Functions Pick your favorite kernel (or add your own!) and click on it to see what a random linear combination looks like. We only illustrate this for the Gaussian, Matern, and inverse multi-quadric kernels, since the other ones have the potential to get too large to see. analytic_geom_kernels_native_spaces.nb 21

rndLinCombBasisFuncs[phi_]:= Module[{c, xvals}, xvals= Range[-4, 4, 1]; c= RandomReal[{-2, 2}, 9]; Plot[(phi[ x,#]&/@ xvals).c,{x,-5, 5}, PlotRange→ {{-5, 5}, {-5, 5}}, AspectRatio→1]]

ManipulaterndLinCombBasisFuncs[phi],

2 phi, Exp-(#1-#2) 2 &, "kernel",Exp-(#1-#2) 2&→ TraditionalForm"ⅇ -(x-y) ", Exp[- Abs[#1-#2]]&→ TraditionalForm"ⅇ - x-y ", 1 1 &→ TraditionalForm" " 1+(#1-#2) 2 1+(x-y) 2

-x-y 2 - x-y 1 kernel ⅇ ⅇ 1+x-y 2

4

2

-4 -2 2 4

-2

-4

Native Spaces In the section above we took linear combinations of our the cross sections of our kernels. Therefore by taking all possible linear combinations of all possible cross sections we may form a vector space. Formally, the set n FK (Ω)= ∑j=1 aj K.,x j n∈ℕ anda j ∈ ℝ 22 analytic_geom_kernels_native_spaces.nb

is a vector space.

Theorem Suppose that {x1,x 2, ...,x n}⊂Ω and K:Ω×Ω→ℝ is a symmetric positive definite kernel, then m n m n ∑k=1 ak K(.,x k), ∑j=1 bj K.,x j = ∑k=1 ∑j=1 ak bj Kx j,x k

defines an inner product on the set of all linear combinations of the kernel functions K.,x j. This makes

the set FK (Ω) into a pre-Hilbert space. This inner product “induces” a norm in the usual way as m 2 m m ∑k=1 ak K(.,x k) = ∑k=1 ∑j=1 ak al K(x k,x l)≥0 and the native space of the kernel is just the completion of this pre-Hilbert space which we denote by

. The induced native space norm is written as f  . We will focus on understanding the induced norm the norm determines the inner product anyway as we will see shortly. In other words, The “native n space” induced by a kernel, K is made of elements of the form ∑k=1 ck K(.,x k) and the limits of such elements in the native space norm. Note: It is important to realize that though the elements of our native spaces are linear combinations of kernels or limits thereof, we are not saying that the coefficients necessarily arise from the solution to an interpolation problem. In other words, the definition of native spaces in no way requires referencing interpolation. That said, the native space of a kernel is a natural setting in which to answer questions about interpolation and approximation with linear combinations of kernels.

Reproducing Kernels Theory This section is more just a smattering of a few key definitions and theorems. This definition is usually credited to Aronszajn. Definition:H is a RKHS if it is a Hilbert where the functions have a domain Ω and such that there isa kernel K, K:Ω×Ω⟶ℝ with the additional properties that for f∈H (1)K(.,x)∈H for all x∈Ω,

(2) f(x) =〈f,K(.,x)〉 H for all f∈H and all x∈Ω Recall, earlier we talked about the “point evaluation functional”. We now introduce some notation:

δx(f)=f(x) where δ x :Ω→ℝ is the linear functional which just means “plug in x”. Now for some theorems without proofs

Theorem: H is an RKHS with the above framework iff the point evaluation functionals δx are all bounded or continuous linear functionals. Theorem: If H is an RKHS with kernel K, then we have K(x,y)=〈K(.,y),K(.,x)〉 H forx,y∈Ω K(x,y)=K(y,x) forx,y∈Ω K(x,x)≥ 0 for allx∈Ω

K(x,y) ≤ K(x,x) K(y,y)

iff-f n → 0 asn→∞, thenf(x)-f n(x) → 0 for allx∈Ω The second property says that kernels are indeed symmetric as observed. The third property says that kernels are indeed non-negative on the diagonal as we observed. analytic_geom_kernels_native_spaces.nb 23

The fourth property is essentially a generalized Cauchy-Schwartz inequality, and in this way kernels are essentially generalized inner products. This last property is crucial and says that convergence in norm implies point wise convergence. This property is in fact one of the main reason we focus on the RKHS framework. Note: It is easy to become confused about the word positive when discussing kernels. In particular positive definite kernels need not be positive valued. Many important kernels are positive, but all ker- nels are merely non negative on the diagonal. This leads to a vitally important theorem. Theorem: Suppose H is a RKHS with kernel K:Ω×Ω→ℝ. Then K is positive definite. Moreover, K is strictly positive definite iff the points evaluation functionals are linearly independent. The easy direction of this result comes down to the following calculation. Given H, a RKHS with kernel K:Ω×Ω→ℝ we have ∑n ∑n a a Kx ,x = n n a a K x K x j=1 k=1 j k j k ∑j=1 ∑k=1 j k  ., j, (., k)H n a K x n a K x = ∑j=1 j ., j, ∑k=1 k (., k)H 2 = ∑n a K.,x  ≥0 j=1 j j H So K is positive definite. Strict positive definiteness, is the key property we require of our kernels in order to guarantee the existence of a unique solution to the interpolation problem.

Motivations for the Definition of the Native Space Inner product The definition of the native space seems very natural on the level of vector spaces, but the inner prod- uct definition, which is it’s genius, seems rather strange and, as we said at beginning, it is not clear what we are measuring. If you already have the RKHS notion in mind, the definition given above can be thought of as a conse- quence of two requirements: bilinearity, which all real inner products must obey, and the so called

reproduction property 〈f,K(.,x)〉  =f(x) of the RKHS framework. This can be thought of as a general-

ized convolution in which K(.,x) plays the role of the delta function δ x. The desire to enforce reproduc- tion implies that we must have 〈K(.,x),K(.,y)〉  =K(x,y) and the rest of the definition follows from bilinearity. Note that it is a powerful assumption to make that we have a Hilbert space of functions because this implies the symmetry and positive definiteness of K as we saw above. Of course, the problem is that the beginner is unlikely to find this particularly enlightening since they usually don’t know what convolution or the RKHS space framework is. Is there another way we can motivate this choice of an inner product that is perhaps more natural? Perhaps. As we mentioned before every inner product on a vector space is, once you have coordinatized, given T by 〈u,v〉=u B v where B is a fixed symmetric and positive definite matrix. Now the a 1,a 2, ..a n play the n role of the coordinates for the elements of our vector space ∑j=1 aj K.,x j as the K.,x j are our basis vectors. Now we say to ourselves that we would like to create an inner product for the subspace gener-

ated by K.,x j,j= 1, ...,n and we have assumed already that K is symmetric and positive definite, a

natural choice for B would be such that Bi,j =Kx i.x j. After all, what else could it be? The matrix

Kxi.x j essentially “encodes” and “packages” all the information on the locations of our points of interest into a standard tool: matrices. Moreover, because we have insisted on our kernel being strictly 24 analytic_geom_kernels_native_spaces.nb

positive definite and symmetric (which was quite natural) these matrices are even more useful. I find this viewpoint provides some helpful intuition, if you don’t, you can safely ignore it and rely on the formalism.

Some Examples Which the Native Space Norm Generalizes Generalizing is fun once you get the hang of it, but it is fairly easy to go a bit crazy and simply general- ize for the sake of generalizing. Jean Dieudonne of Bourbaki fame argues this forcefully with respect to Category theory I seem to recall. I have personally found that it is also easy to generalize incorrectly, although I believe I am in fairly good company there. Finally, it is possible to generalize rather point- lessly. You still get to write papers and get tenure I guess. One test of whether an idea is worth its salt as a generalization is to see what particular ideas get subsumed under it. We will cheat a bit right now because our memory is failing us, and only talk about three examples, two of which are based on conditionally positive definite kernels. Honestly, I have no idea at this juncture what if any classical function space corresponds to the native space norm of the Gaussian. Embarrassing, I know. Since the time of the original writing I have since been informed by Professor Fasshauer via personal communication that the native space of the Gaussian is not, in fact, a classical function space and that the inner product can be defined in terms of an infinite pseudo-differential operator. I guess that sort of makes sense...not really. ◼ If we use the kernel K(x,y)=x-y to interpolate a function on the interval [a,b]. Our interpolants ′ are piecewise linear splines and f  =f 2. Actually, this is a conditionally positive definite kernel so we can see that this is a semi norm in fact with an ambiguity of the addition of constants. In fact, the native space is really just the Sobolev space H1[a,b] “modded out” by constants. If “modding” is too informal this is the process known as “quotienting” which in this case means we consider two functions the same if they differ by a constant. One potentially subtle issue is that the Sobolev spaces are usually defined on open subsets of Euclidean space like (a,b) and we wish to have our functions defined on [a,b]. This is fine since we are in dimension one and the Sobolev embedding theorem implies that all the elements of H1(a,b) are continuous and we may therefore continuously extend them to a unique continuous function on [a,b]. 3 ″ ◼ If we use the kernel K(x,y)=x-y our interpolants are piecewise cubic splines and f  =f 2. Actually, this is a conditionally positive definite kernel so we can see that this is a semi norm in fact with an ambiguity of the addition of first degree polynomials. As above, for all the same reasons, the native space is really just the Sobolev space H2[a,b] “modded out” by first degree polynomials.  -x-y f (ω) ◼ If we use the kernel K(x,y)=ⅇ then f  = up to a constant multiple. We have 1+ω2 2 abused notation here by putting the ω here. This kernel is positive definite so there is no ambiguity. This native space is metrically really just the Sobolev space H1(ℝ). We don’t wish to suggest that all native spaces are really just quotiented Sobolev spaces, although it has been argued by others that they can be regarded as generalized Sobolev spaces where we we are concerned not merely with the smoothness of the functions in the space but also with their “peakiness” (e.g. Fasshauer). The “peakiness” is usually controlled by a so-called shape parameter. This has interesting implications because it means that although it might be fine to consider a given function as in the same smoothness class or Sobolev space as another function, for the purposes of kernel approxi- mation it might be wise to regard them as in different subspaces of a given Sobolev space where each analytic_geom_kernels_native_spaces.nb 25

subspace is determined by a different shape parameter perhaps. In fact, in the “fog” that constitutes my own “research” I vaguely perceived this distinction when I finally said to myself, “just because a function is wiggly does not mean it’s not smooth, and vice versa”, or in other words: stop conflating smoothness and wiggliness. Unfortunately, this insight came rather late in the game. Finally, to get back to the issue of why I have only mentioned Sobolev spaces, that’s all I remember off the top of my head. The details of the native spaces resulting from thin plate splines, for example, being Beppo Levi spaces can be found in the books by Buhmann, Fasshauer, Wendland or the papers by Schaback.

Comment about Subspaces Due to feedback from the seminar, the speculative comment made above about subspaces, which the author is solely responsible for and is not attributing to anybody else, is currently under internal review.

Interpolation Using Kernels First we introduce the interpolation problem and we discuss the kernel method of finding the interpolant. The discussion is very similar to the polynomial interpolation problem discussion, but the Mairbuber Curtis theorem means that we can’t use a data dependent basis if we wish to do interpolation in arbi- trary dimension with arbitrary point configurations. Our basic problem is as follows: we are given a list of points and we want to fit a function which goes exactly through those points:

Given{(x 1,y 1),(x 2,y 2),(x 3,y 3), ...,(x n,y n)} we want to finda functions such thats(x i) =y i We will actually assume that our y values come from a function f and that we are simply sampling

some of the domain values. This can be made explicit by writing our interpolating function as sf (xi) =y i. This problem is a familiar problem of course, and when one first learns of it, the domain space is usu- ally one dimensional and the solution by polynomials is given. We will follow the same ideas except j now our basis functions are data dependent. In other words, we replace K j(x) =x with Kx,x j. This gives us the matrix

c1 y1 K(x 1,x 1)⋯K(x 1,x n) c2 y2 ⋮ ⋱ ⋮ = orAc=y ⋮ ⋮ K(x n,x 1)⋯K(x n,x n) cn yn If K is a strictly positive definite function then this matrix is guaranteed to be invertible, and we obtain c=A -1 y which gives us an interpolant of n s(x) = ∑j=1 cj Kx,x j We will assume that our values come from a function and also that our data points depend on our n. So our interpolants will be n sf (x) = ∑j=1 cj Kx,x j wheres f xj=fx j=y j

Interpolation and Native Spaces

In this section and beyond, the reader is especially encouraged to play with the Manipulates and make some conjectures. By the way, you can add points by holding down “Alt” and clicking the 26 analytic_geom_kernels_native_spaces.nb

mouse to the Manipulates that use the “Locator” type of control. Up until now we have defined what a native space is and we have described how to do interpolation using a kernel, but we actually haven’t connected these two concepts. We do so now. Suppose that we have a function f and an interpolant from our native space interpolating the function values of f , i.e., n sf (x) = ∑j=1 cj Kx,x j wheres f xj=fx j=y j If the kernel is highly localized, for example something which is approximately interpolating the Kro- necker delta function then it is clear that

2 n 2 n n n n n 2 n 2 n 2 sf  = ∑ cj K.,x j = ∑ ∑ cj ck Kx j,x k ≈ ∑ ∑ cj ck ϵj,k = ∑ cj ≈ ∑ yj = ∑ fx j  j=1  j=1 k=1 j=1 k=1 j=1 j=1 j=1 Since in this case A≈I which implies that c=A -1 y≈y. So, in the case of a highly localized kernel approximating the Kronecker delta function we see that the native space norm is approximately just the l2 norm of the function values and one could therefore argue that the native space norm is a generalized type of l2 or L2 type of norm. This isn’t a surprise however since we already know that the native space is a Hilbert space. If the coefficients come from the solution to an interpolation problem then we could say that the native space norm is some sort of weighted average of the products of the coefficients solving the interpolation problem but this only describes how to compute the norm and provides no additional insight. Let’s take a break and test out this definition with some code. Play with the ϵ slider below. In particular you will see that it is indeed behaving like the l2 norm of the function values as ϵ → ∞. Actually ϵ= 4 is fine for a small number of points, but as the number of points increases inside a fixed interval you would need to further localize the kernel function by increasing ϵ. Finally, you will observe that when the function values are equal to one, a localized kernel function simply counts the number of points.

kernelInterp1[phi_,ϵ_, list_]:= Module{A, c, xvals, yvals}, xvals= First/@ list//N; yvals= Last/@ list// N; A= Outer[phi[ϵ#1,ϵ#2] &, xvals, xvals, 1]; c= LinearSolve[A, yvals]; Plot(phi[ϵx,ϵ#]&/@ xvals).c,{x,-5, 5},

PlotLabel→ "ns norm=" <> ToString c.A.c,

PlotRange→ {{-5, 5}, {-5, 5}}, AspectRatio→1 analytic_geom_kernels_native_spaces.nb 27

ManipulatekernelInterp1[phi,ϵ, list],{{list,{{-2, 1},{-1, 1},{2, 1}}}, Locator,

LocatorAutoCreate→ True}, {{ϵ, 5}, .01, 10}, phi, Exp-(#1-#2) 2 &, "kernel",

2 2 Exp-(#1-#2) 2&→ TraditionalForm"ⅇ -ϵ (x-y) ", Exp[- Abs[#1-#2]]&→ TraditionalForm"ⅇ -ϵ x-y ", 1 1 &→ TraditionalForm" " 1+(#1-#2) 2 1+ϵ 2 (x-y) 2

ϵ

-ϵ2 x-y 2 -ϵ x-y 1 kernel ⅇ ⅇ 1+ϵ2 x-y 2

ns norm= 1.73205

4

2

-4 -2 2 4

-2

-4

The problem with what we have done above is that we really only have an intuitive understanding of the native space norm when ϵ is large and the kernel is very localized. When ϵ is small and the kernels start “mixing” with one another or “communicating” things are far less clear. In particular, if someone asks you, “Yeah, but how does the native space norm relate to the function directly?” so far we don’t have an answer.

Relating the Native Space Norm to the Kernel and Function Values 28 analytic_geom_kernels_native_spaces.nb

We would like to better understand what the native space norm is measuring. As minimal goal, we would like to be able to look at a function and at least be able to say if it has a small or large native spaces norm. More precisely, since “small” and large are relative terms, if we are given two functions we should be able to determine the relative sizes of their native space norms by looking at visually inspectable features of the function. To that end we will give several expressions for the which involve the function values. The problem with our current expression is that it directly depends on the c's. What we actually want is an expression that depends on the y’s and involves norms so that we can think about it more geometrically. It is useful to realize that the native space inner product/norm can be written in terms of matrices and vectors as n c K .,x , n c K .,x n n c c K x ,x T ∑j=1 j  j ∑k=1 k ( k) = ∑j=1 ∑k=1 j k  j k =c A c

In fact, this is the form of the native space norm we used in the code above. Note that cT A c=c T y so we get another form of the native space norm with just one c in it. We are not claiming that this is particularly useful however since again c is hard to visualize. We would rather find an expression that does not explicitly involve c. Since A is positive definite and symmetric it has a Cholesky factorization A=B T B. Note: we are not following the standard form of the Cholesky decomposition here in which the second factor has the transpose, this is the convention Mathematica follows. cT A c= cT BT Bc= cT BT (B c=( B c)T ( B c)= B c2 But if these coefficients arise from the solution to an interpolation problem then we have

2 T 2 -1 2 s f H =c A c= B c = B A y But -1 B A-1 = B BT B = B B-1 B-T =B -T So we have

2 -T 2 s f H =B y There are two observations to make right now. First, we have now related the native space norm of our n function (albeit of the form ∑j=1 cj K.,x j ) directly with the function values which is what we wanted. Second, the solution to this problem is almost exactly the same as the solution to the analytic geometry problem we posed about ellipses: first do Cholesky, take the inverse, and then the transpose! The additional transpose is there because we are computing something slightly different. This connection was not expected originally, so this is one of those very pleasant surprises that only mathematics can deliver. We set the ellipse problem to ourselves as a “pure” problem that we felt should have an answer and then it turns out to be the same as a problem relevant to understanding kernels. To editorialize a bit more, this is why you should “run like hell” from small minded people who tell you that pure mathematics isn’t useful: it usually is, but the application may not have been discov- ered yet. If you like this sort of philosophizing you should read the article called “On the Unreasonable Effectiveness of Mathematics” by Eugene Wigner. One might be tempted to think that if all we wanted was an expression that directly depends on the y's then we can simply do the following: analytic_geom_kernels_native_spaces.nb 29

cT A c=A -1 yT y=y T A-T y=y T A-T y

However, again, what we want is an expression that depends on the y’s and involves norms so that we can think about it the native space norm more geometrically, so we would need to finish the above calculation to obtain

-T T T 2 yT BT B y=y T B -1 B-T  y=y T B-1 B-T y=B -T y B-T y=B -T y which is what we obtained above. At this time it is more instructive to see that this indeed works. Play with the sliders in the Manipulate below we will see that our three expressions (there are others) for the native space norm indeed give the same answer.You should also try to gain a feel for when is the native space norm large versus when it is small and then try and connect this to visual features of the graph. Remark: Again, Mathematica follows the convention that the first factor has the transpose in the Cholesky decomposition. It seems to me to be a non standard choice so I don’t know the reason. kernelInterp2[phi_,ϵ_, list_]:= Module{A, c, xvals, yvals}, xvals= First/@ list//N; yvals= Last/@ list// N; A= Outer[phi[ϵ#1,ϵ#2] &, xvals, xvals, 1]; c= LinearSolve[A, yvals]; Plot(phi[ϵx,ϵ#]&/@ xvals).c,{x,-5, 5}, PlotRange→ {{-5, 5}, {-5, 5}}, AspectRatio→ 1, PlotLabel→  c.A.c , Norm[CholeskyDecomposition[A].c],

Norm[Transpose[Inverse[CholeskyDecomposition[A]]].yvals] 30 analytic_geom_kernels_native_spaces.nb

ManipulatekernelInterp2[phi,ϵ, list], {{list, {{-2, 1},{2, 1}}}, Locator, LocatorAutoCreate→ True}, {{ϵ, 5}, .01, 5}, phi, Exp-(#1-#2) 2 &, "kernel",

2 2 Exp-(#1-#2) 2&→ TraditionalForm"ⅇ -ϵ (x-y) ", Exp[- Abs[#1-#2]]&→ TraditionalForm"ⅇ -ϵ x-y ", 1 1 &→ TraditionalForm" " 1+(#1-#2) 2 1+(x-y) 2

ϵ

-ϵ2 x-y 2 -ϵ x-y 1 kernel ⅇ ⅇ 1+x-y 2

{1.41421, 1.41421, 1.41421}

4

2

-4 -2 2 4

-2

-4

What is being measured? My hope was that after all of these Manipulates you have discovered for yourself how to make the native space norm larger for a fixed kernel. Roughly speaking there are four ways: analytic_geom_kernels_native_spaces.nb 31

◼ You can add more points. This actually follows from an optimality property of rbf interpolants in the native space norm. For a given set of interpolation points, the rbf interpolant must be the minimum norm interpolant! A lovely result generalizing the optimality of the natural cubic splines in the H2 Sobolev norm. Therefore, it follows that if you leave the old points in place and add an additional point, the Native space norm must get bigger. ◼ You can make ϵ smaller...most but not all of the time. ◼ You can move the interpolation points so that the y is larger. ◼ You can make the derivative larger by moving the interpolation points to create a large slope. The above list was in term of the interpolation points but remember the native space norm depends on both the function and kernel chosen--it doesn’t know that you are sampling it to find interpolants! There- fore the above list suggests functions that are very large and very small especially if those changes happen fast with many changes of direction will have a high native space norm. Obviously, this begs a precise formulation and some kind of proof. Note that it also leaves out the issue of what kernel we have chosen and the value of the shape parameter.

The Native Space Norm for the Gaussian Using Just Two Points We confine ourselves to the Gaussian because for now that is our main kernel of interest. We confined ourselves to two points because the calculation is enough of a pain for two but also two points suffices to get our point across. It is possible that three points would reveal more, but we leave that for another day. First we define an arbitrary element of our the native space of the Gaussian with just two basis func- tions. Obviously, this can’t provide a complete picture of what is going on, but we will see that this extremely special case can provide some insight.

f[x_]:= c1 Exp-ϵ 2 (x- x1) 2+ c2 Exp-ϵ 2 (x- x2) 2 We will now calculate the native space norm of this in two ways. The first is easy T 2 2 -(x1-x2)2 ϵ2 c A c=c 1 +c 2 +2c 1 c2 ⅇ and is what we’ve been doing all along. The second way of calculating the native space norm is not necessary, and a good deal more difficult, but we wish to illustrate a way to calculate the native space norm of a general function not one given as a linear combination of kernel “translates”. Remarkably, we can, in some cases, actually write down a conventional weighted L2 type of inner product formula for the native space. In particular, this can be done if the kernel is a strictly positive definite, translation invariant, and globally defined on all of ℝd and native space norm is given in terms of an integral of Fourier transforms . See the appendix for this result, but you can find this in the books by Buhmann, Fasshauer, Wendland, or the papers by Sch- aback. Skipping details, the result, which is quite lovely by the way, basically says that for such a kernel we have  2 d f (ω) 2 - f = (2π) 2 ⅆω  ∫ℝd  Φ(ω) For us this is just 32 analytic_geom_kernels_native_spaces.nb

 2 ∞ f (ω) f 2 = 1 ⅆω  ∫-∞  2π Φ(ω) So we may calculate.

FourierTransform[f[x], x, w] // PowerExpand// Simplify

2 - w 2 ⅈ w x1 ⅈ w x2 ⅇ 4ϵ c1ⅇ + c2ⅇ 

FourierTransformExp-ϵ 2 x2, x, w// PowerExpand// Simplify

2 - w ⅇ 4ϵ 2

1 Simplify w2 - ⅇ 4ϵ 2

2 2 - w - w 2 ⅈ w x1 ⅈ w x2 2 ⅈ w x1 ⅈ w x2 ⅇ 4ϵ c1ⅇ + c2ⅇ  ⅇ 4ϵ c1ⅇ + c2ⅇ  Conjugate , w∈ Reals &&ϵ∈ Reals 2ϵ 2ϵ

1 w2 - ⅈ w x1 ⅈ w x2 ⅈ w x1 ⅈ w x2 ⅇ 4ϵ 2 c1ⅇ + c2ⅇ  Conjugatec1ⅇ + c2ⅇ  2ϵ Interestingly, if you try a direct assault on this integral in Mathematica it takes a while. I just aborted it.

1

2π 1 w2 - ⅈ w x1 ⅈ w x2 ⅈ w x1 ⅈ w x2 Integrate ⅇ 4ϵ 2 c1ⅇ + c2ⅇ  Conjugatec1ⅇ + c2ⅇ ,{w,-∞,∞} 2ϵ $Aborted

Instead, first foil and then do a trig identity and this becomes reasonable.

1 w2 - 2 2 ⅈw(x1-x2) ⅈw(x2-x1) ⅇ 4ϵ 2 c1 + c2 + c1 c2ⅇ + c1 c2ⅇ  2ϵ

1 w2 - 2 2 ⅇ 4ϵ 2 c1 + c2 + 2 c1 c2 Cos[w(x1- x2)] 2ϵ Now try it and it works

1 1 Integrate 2π 2ϵ

w2 - 2 2 ⅇ 4ϵ 2 c1 + c2 + 2 c1 c2 Cos[w(x1- x2)],{w,-∞,∞}, Assumptions→ϵ>0

2 2 c12 + c22 + 2 c1 c2ⅇ -(x1-x2) ϵ analytic_geom_kernels_native_spaces.nb 33

Which is just cT A c as we said. Up to now we have not assumed that the c arises from interpolation. We do so now pulling out some of the code we’ve used over and over again.

Clear[x1, y1, x2, y2] phi= Exp-(#1-#2) 2 &; xvals={x1, x2}; yvals={y1, y2}; A= Outer[phi[ϵ#1,ϵ#2] &, xvals, xvals, 1]; {c1, c2} = LinearSolve[A, yvals]

2 2 2 2 ⅇ(-x1ϵ+x2ϵ) ⅇ(-x1ϵ+x2ϵ) y1- y2 ⅇ(-x1ϵ+x2ϵ) -y1+ⅇ (-x1ϵ+x2ϵ) y2  ,  2 2 -1+ⅇ 2 (-x1ϵ+x2ϵ) -1+ⅇ 2 (-x1ϵ+x2ϵ)

Eigenvectors[A] {{-1, 1},{1, 1}}

2 2 c12 + c22 + 2 c1 c2ⅇ -(x1-x2) ϵ // Simplify

2 2 2 2 ⅇ(x1ϵ-x2ϵ) -2 y1 y2+ⅇ (x1ϵ-x2ϵ) y12 + y22  -1+ⅇ 2(x1-x2) ϵ 

Since the behavior of this isn’t totally clear, we make a substitution of t=x 1 -x 2 so that we can hit it with the Series command.

2 2 2 2 2 2 NormalSeriesⅇϵ t -2 y1 y2+ⅇ ϵ t y12 + y22  -1+ⅇ 2ϵ t ,{t, 0, 1}/. t→(x1- x2)// Simplify 1 (y1- y2) 2 y12 + y22 + 2 (x1- x2) 2 ϵ2

So it is now clear why we saw the kind of behavior we saw. Indeed if you make the function values large in magnitude the native space norm gets bigger but making ϵ small or the derivative large has the same effect. Doing so at the same time really makes the norm grow in fact. We interpret this in terms of interpolation as it is hard to interpolate something with high derivatives with something wide and flat. Not a very original insight I think. More generally one might say that the native space norm of f is measuring the difficulty of interpolating f relative to the choice of the kernel K since, presumably, larger coefficients are required for a kernel that is a less appropriate choice. One might argue further, and not very rigorously but with imagination, that as the native space norm gets larger it is approaching ∞ , and therefore the function in question is increasingly on the way to not being in the native space at all, in which case it is hardly a surprise you shouldn’t try to interpolate it with the offending kernel. We also notice that if we were to confine ourselves to the so-called stationary setting in which we scale the shape parameter in such a way to keep hϵ a constant (here h=x 2 -x 1 is the fill distance or mesh size) then the size of the function values dominates in this expression. The above expression looks like a surrogate for an H1 or “energy” type of norm 1 f 2 + 1 f ′2 2 2 2ϵ 2 2 in which smaller ϵ gives more weight to the size of the derivative over the size of the function incidentally. Now we mentioned in the title that we were only going to look at the Gaussian for two points due to time and space considerations, but it should be clear how to do the same calculation for other kernel and 34 analytic_geom_kernels_native_spaces.nb

more points. One thing we should notice immediately is that we easily could have gotten more mileage out of our series calculation by increasing the number of terms.

Factor[#]&/@ 2 2 2 2 2 2 NormalSeriesⅇϵ t -2 y1 y2+ⅇ ϵ t y12 + y22  -1+ⅇ 2ϵ t ,{t, 0, 10}

1 (y1- y2) 2 1 y12 + y22+ + t2 y12 + y1 y2+ y2 2 ϵ2 - 2 2 t2 ϵ2 6 1 t10 16 y12 + 31 y1 y2+ 16 y2 2 ϵ10 t6 4 y12 + 7 y1 y2+ 4 y2 2 ϵ6 + 360 15 120 So in addition to our observation about fixed ϵ, if we shrink our mesh size or our ϵ, the contribution from the function values vanish from the third term on since they appear to remain quadratic? Interesting. One thinks that we probably should use more points to see if it persists. One more observation we can make is that as you may have observed the native space norm gets bigger as you you make ϵ smaller. As you might have guessed as ϵ → 0 the native space norm approaches infinity. You can see this from the above series, or you can simply try to compute the limit.

2 2 2 2 Limitⅇ(x1ϵ-x2ϵ) -2 y1 y2+ⅇ (x1ϵ-x2ϵ) y12 + y22  -1+ⅇ 2(x1-x2) ϵ ,ϵ→0

Sign[y1- y2] 2 DirectedInfinity  Sign[x1- x2] 2

This is the native space norm squared. If we take square roots, we see that, ignoring the whole issue of infinity, we see that the slope of the line that the interpolant converges to in the “flat limit” has just popped out! This polynomials as “flat limits” is very cool in my opinion.

Can We Visualize the Native Space Norm in Another way? Yes, we can and we will do so in the Manipulate below. We are not claiming that this is particularly valuable however and are doing it mainly just because we like the picture. We have a second reason however which we will mention at the end. Let’s recap our discussion thus far. Our initial calculation cT A c for the native space norm essentially comes down to thinking in of an ellipse in the form q p x 2 E=(x,y) (x,y)P =1 whereP= q is positive definite y r 2

2 We then showed that cT A c=B -T y where B is the factor coming from the Cholesky factorization of A, or in other words, we are thinking of our ellipse as the image of a circle cos(θ) a b E=S θ ∈[0, 2π] whereS= sin(θ) d c

Now B-T y can be visualized as the generalized radius of an ellipse somewhere between the value of the major and minor axes of our ellipse. But we know from Trefethen’s book on Numerical Linear algebra that axes of a “hyper ellipse” (the image of a sphere under a matrix) have lengths equal to the analytic_geom_kernels_native_spaces.nb 35

singular values of the matrix A. Finally, we note that it is easier to calculate the singular values of the inverse of the Cholesky matrix just by computing the Eigenvalues of the original matrix and raising them to the - 1 power. For illustration purposes, we have redundantly implemented these ideas in the code 2 below. kernelInterp3[phi_,ϵ_, x1_, x2_, y1_, y2_]:= Module{A, c, key, singKey, sing}, A= Outer[phi[ϵ#1,ϵ#2] &,{x1, x2},{x1, x2}, 1]; - 1 sing= Eigenvalues[A] 2 ; key= Inverse[CholeskyDecomposition[A]]; singKey= SingularValueDecomposition[key]; c= LinearSolve[A,{y1, y2}]; GraphicsRowPlot(phi[ϵ x,ϵ#]&/@{x1, x2}).c,{x,-4, 4},

PlotLabel→ Grid{"small sing", "ns norm", "largest sing"},

 y12 + y22 sing[[1]], N[c.A.c], y12 + y22 sing[[2]],

Frame→ All, PlotRange→{{-4, 4},{-4, 4}},

AspectRatio→ 1, Epilog→{Red, Point[{{x1, y1},{x2, y2}}]},

ParametricPlot y12 + y22 key.{Cos[θ], Sin[θ]},{θ, 0, 2π},

PlotLabel→ Grid{"small sing", "ns norm", "largest sing"},

 y12 + y22 singKey[[2, 2, 2]], Norm[Transpose[key].{y1, y2}],

y12 + y22 singKey[[2, 1, 1]], Frame→ All,

PlotRange→ {{-10, 10}, {-10, 10}}, AspectRatio→ 1, Epilog→Red, Line{0, 0},{z, w}/. FindRoot{z, w}.A.{z, w}⩵ y1 2 + y22, z2 +w 2 ⩵ Norm[Transpose[key].{y1, y2}]2,{{z, y1},{w, y2}}, Green, Circle[{0, 0}, Norm[Transpose[key].{y1, y2}]], Orange,

Circle{0, 0}, y12 + y22 singKey[[2, 1, 1]], Purple,

Circle{0, 0}, y12 + y22 singKey[[2, 2, 2]] 36 analytic_geom_kernels_native_spaces.nb

ManipulatekernelInterp3[phi,ϵ, x1, x2, y1, y2], {{ϵ, 5}, .001, 5},{{x1,-1},-3, 3}, {{x2, 1},-3, 3}, {{y1, 1.5},-3, 3}, {{y2, 1},-3, 3}, phi, Exp-(#1-#2) 2 &, "kernel",

2 2 Exp-(#1-#2) 2&→ TraditionalForm"ⅇ -ϵ (x-y) ", Exp[- Abs[#1-#2]]&→ TraditionalForm"ⅇ -ϵ x-y ", 1 1 &→ TraditionalForm" " 1+(#1-#2) 2 1+ϵ 2 (x-y) 2

ϵ

x1

x2

y1

y2

-ϵ2 x-y 2 -ϵ x-y 1 kernel ⅇ ⅇ 1+ϵ2 x-y 2

small sing ns norm largest sing small sing ns norm largest sing 1.59225 5.67348 6.53074 1.59225 5.67348 6.53074 4 10

2 5

-4 -2 2 4 -10 -5 5 10

-2 -5

-4 -10

As we near the end of our journey it would appear that we’ve come full ellipse and returned to the topic from which we have started. More precisely, we have an ellipse and three circles: circumscribing and inscribing circle to the ellipse, and an in between circle whose radius is the red line. The length of the generalized radius (circles and ellipses are projectively equivalent of course) is the native space norm of the function on the left and it is clear how to make it grow and shrink by playing with y values, slopes, and the shape parameter itself. analytic_geom_kernels_native_spaces.nb 37

Anyway, one conclusion is that, though this example is simple, we imagine that a curve with more interpolation points that wiggles a lot could be visualized as an ellipsoid in a high dimensional space with major and minor axes of very different sizes. Since a computer program is not a proof we now demonstrate why this code is doing what is claimed. To get an upper bound on the native space norm in terms of the function values we can proceed as follows:

-T -T -1/2 f  =B y≤ B  y=λ min y

-T -1 -T T -1 -1 -1 -1/2 The last equality can be seen from B = ρB B = ρB B = ρA = λmin =λ min

We may also get a lower bound on f  by turning the standard crank for this situation

T -T T -T 1/2 -T y=B B y≤ B  B y=λ max B y

-1/2 -T which implies that λmax y≤ B y=f  . Where the last equality above follows in a manner similar to the above calculation. So in summary, we have shown the following equivalent statements. The first one is the one we’ve implemented. -1/2 -1/2 λmax y≤f  ≤ λmin y f λ-1/2 ≤  ≤ λ-1/2 max y min 1/2 y 1/2 λmax ≥ ≥ λmin f  1/2 1/2 f  λmin ≤ y≤ λmax f  Moreover it can be shown that equality is achieved by the Eigenvectors (1, 1) and (1,-1).

1/2 This brings us to the other reason alluded to above. The conclusion that y≤ λmax f  ≤ 2f 

since λmin +λ max = 2 and λmin ≥ 0 andλ max ≥ 0 . This result appears to me to be a discrete and finite version of the result that for compact Ω we have (from Wendland) that

1 f 2 ≤  K(x,x)ⅆx 2 f L (Ω) ∫Ω  since  K(x,x)ⅆx corresponds to the tr(A) = 2 if we use the counting measure and the fact that the ∫Ω trace of a matrix is the sum of the Eigenvalues. This result establishes that  can be continuously embedded in L2. One then finds the adjoint of this mapping and uses this and Mercer’s theorem to find an Eigenbasis with which to represent the original kernel. This is a significantly more advanced than what we have been doing but it is nice to see that these more advanced notions have fairly elementary origins.

How Do We Calculate the Native Space Norm of an Arbitrary Function? The question asked in the title of this section is sort of the 800 pound gorilla sitting in the corner of the room. Lets’ review what we have done so far. n ◼ We stated what the native space norm of something of the form norm of ∑j=1 cj Kx,x j is. 38 analytic_geom_kernels_native_spaces.nb

n ◼ Then we numerically found what the native space norm for an interpolant sf (x) = ∑j=1 cj Kx,x j where

we’ve been given fx j=y j by solving the interpolation equations for the c's. ◼ Then we found an expression for the native space norm of an interpolant in terms that only directly depend on the kernel and the function values. Granted this depends on both matrix inversion and the Cholesky factorization. ◼ By playing with sliders we gained some insight into what the native space norm was measuring informally. ◼ Then we found an exact analytic expression for the natives space norm of a two point interpolant in terms of the function values which verified our previous informal observations. ◼ Finally, we created a Manipulate which put much of this together visually, even providing a way to see the size of the native space norm in terms of an ellipse using the singular value decomposition. Now for the shortcoming of what we’ve done. Usually, if we are given an arbitrary function on ℝ it isn’t presented to us as a finite linear combination of kernel functions and their coefficients, nor even as the solution to an interpolation problem using a known basis of kernel functions. Since we are often inter- ested in theoretical results, it is given to us as a forumula. Therefore we would like to have formulas or methods for computing the native space norm that only depends on function values. If we restrict ourselves to the case of strictly positive definite, translation invariant, and globally defined on all of ℝd (where d= 1 actually) we may appeal to the result in the appendix and compute the integral  2 d f (ω) 2 - f = (2π) 2 ⅆω. The problem is that, even with all of these restrictions, for many choices  ∫ℝd  Φ(ω) f and Φ we won’t actually be able to compute this integral exactly. If we can great, but if not then we can try to evaluate this integral numerically. Once we have decided to go the route of numerics however a third approach presents itself: approximate the function on ℝ by an interpolant with perhaps many points and a small meshsize and then just use the definition. Suppose f∈. By definition then there exists a Cauchy sequence mk fk ∈ spanK.,x kj  xkj ∈ Ω,j= 1, 2, ...,m k, or, in other words, there exists fk = ∑j=1 ckj K.,x kj  such that

for all ϵ> 0, there exists an N such that k>N implies f-f k <ϵ. But this implies since all norms are

continuous, by the usual corrollary of the triangle inequality, that f  -f k  ≤ f-f k <ϵ, or in

other words that fk → f  . Since we know how to compute the native space norm of linear combi-

nations of our kernel elements and we can solve for the ckj using our interpolation matrix, this gives us a practical alternative way to compute the native space norm in some cases. We have a feeling that the above argument is a bit “hand wavy” and that perhaps we should be using a notion like “Cauchy nets” or some such--which we don’t know. Rather than trying to improve the rigor we illustrate this approach in the code below.

Illustrating the Above Approach to Calculating the Native Space Norm We will illustrate the above approach by calculating the native space norm of the Gaussian function 2 2 2 f(t)=ⅇ -t using another Gaussian Φ(t)=ⅇ -ϵ t . First we will use the result in the appendix to compute an exact result and then we will compare the exact result from our approximations via interpolation. analytic_geom_kernels_native_spaces.nb 39

FourierTransformExp-t 2, t, w, FourierTransformExp-ϵ 2 t2, t, w

2 w2 - w - ⅇ 4 ⅇ 4ϵ 2  ,  2 2 ϵ2

w2 2 - ⅇ 4

1 2 Integrate ,{w,-∞,∞} w2 π - 2 ⅇ 4ϵ 2

2ϵ ϵ 1 ConditionalExpression , Re <2 ϵ2 - 1 2 ϵ2

1 Reduce < 2, ϵ ϵ2 1 1 ϵ<- ||ϵ> 2 2

1 //N 2 0.707107

Note that in the above calculation we have along the way shown a simple but perhaps not widely known 2 2 2 result that f(t)=ⅇ -t is not in the native space of Φ(t)=ⅇ -ϵ t unless ϵ> 1 . This mini result is undoubt- 2 edly now new, but was discovered by the author and his research student Sergey Papushin--- who is in high school incidentally. We interpret roughly as you shouldn’t try to interpolate/extapolate a gaussian with a Gaussian that is relatively too wide where the “too wide” factor is given by the magic number 1 . 2 Sergey will present the results of his research in the upcoming “Optimal Point Selection for Kernel Interpolation”. 40 analytic_geom_kernels_native_spaces.nb

interpolateGaussian[ϵ_, n_]:= Module{list, phi, A, B, c, xvals, yvals, func, ns, temp}, list=#, Exp-# 2&/@ Range[-5, 5, 10/(n-1)]; xvals= First/@ list//N; yvals= Last/@ list// N; phi= Exp[- (#1-#2)^2]&; A= Outer[phi[ϵ#1,ϵ#2] &, xvals, xvals, 1]; c= LinearSolve[A, yvals]// Quiet; func=(phi[ϵx,ϵ#]&/@ xvals).c; 2 Plotfunc,ⅇ -x ,{x,-10.1, 10.1}, Epilog -> {Red, Point[list]}, PlotRange→{{-10.1, 10.1},{-1, 1}}, ϵ Axes→ True, AspectRatio→ 1, PlotLabel→c.A.c,  - 1 2 ϵ2

Manipulate[interpolateGaussian[ϵ, n],{{ϵ, 2.0}, .5, 10},{{n, 2}, 2, 100, 1}]

ϵ

n

{2.82503, 2.82503} 1.0

0.5

-10 -5 5 10

-0.5

-1.0

In the manipulate below, choose a value of ϵ and then increase the value of n to see the native space norm of the interpolant approach the expected exact value of native space norm of the function with analytic_geom_kernels_native_spaces.nb 41

respect the the Gaussian with the given ϵ.

Further Work One issue that we might pursue is the possible connection to the Newton basis. Schaback’s 2011 paper establishes the connection of the Cholesky factorization with the Newton basis.

Appendix

We include a result referred to earlier regarding the native space norm of a function defined on all of ℝd if the kernel is a translation invariant positive definite function. This is from taken Wendland’s book. d d Theorem: Suppose that Φ ∈Cℝ  ⋂L 1ℝ  is a real valued positive definite function. Define  d d f d =f∈Cℝ  ⋂L 2ℝ  ∈ L ℝ   2 Φ and equip this space with the bilinear form

d   d   - f g - f (ω)g (ω) (f,g) = (2π) 2 , = (2π) 2 ⅆω  ∫ℝd    d Φ(ω) Φ Φ L2ℝ 

Then  is a real Hilbert space with inner product (., .) with reproducing kernel Φ(.-.) and  is the native space of Φ on ℝd.

Acknowledgments and a few Comments I would like to thank my parents, especially my mother, for teaching me the value of intellectual pursuits and for making me mentally tough enough to put up with all the nonsense such pursuits inevitably brought. I am also grateful to my high school math teachers for doing a good job. I must also acknowl- edge the brilliant algebraic topologist Victor Guggenheim for always being absurdly generous with his time when I was just a kid. Victor gave me his mathematical blessings when I was in high school and this is partly the reason, despite plenty of non encouragement, or even actual discouragement, I kept right on going. It is an unpleasant but necessary thing to say, but I suspect that we (mathematicians) have to partly accept some blame for the paucity and homogeneity of the students that wish to study our subject. I am also grateful to Arnold Ross and the Ross Summer Program for providing a rich summer experience in conjecturing and proving theorems. While I am quite happy the Inquiry Based Learning (IBL) movement has regained some popularity recently, it seems rather bizarre to me that many in this community are either unaware of, or unwilling to acknowledge, the concrete, long term, and real contributions of the Ross program in the United States as evidenced by its many alumni (far more illustrious than the author incidentally). This sort of reminds me of the whole “east coast” vs “west coast” debate actually. It’s all right if you don’t get the reference. I am also grateful to IIT and the IIT department of Applied Mathematics for providing a stimulating environment in which to learn and do research. In particular, I am grateful to the chair of the math department Fred Hickernell and my advisor Greg Fasshauer for leading the meshfree/kernel methods group with which I have been associated since its inception and which has always provided helpful feedback on my mathematical ruminations and tolerated my tendencies towards speculation. I would also like to single out my advisor Greg Fasshauer who introduced me to this topic and taught me all the correct things that I know -- any short- 42 analytic_geom_kernels_native_spaces.nb

coming are my own however. I would also like to thank the authors of the books/papers mentioned throughout and in the bibliography for giving us all so much to think about. Finally, while I know it is not fashionable to like commercial software, I am grateful to Wolfram Research for providing such an incredibly fun and powerful environment to work in.

Bibliography Aronszajn, N.(1950).Theory of reproducing kernels, Trans. Amer. Math. Soc. 68, 337-- 404.

Buhmann, M. D. (2003). Radial Basis Functions. Cambridge: Cambridge University Press.

Cheney, W. (2001). Analysis for Applied Mathematicians. New York: Springer-Verlag,

Fasshauer, G. E. (2007). Meshfree Approximation Methods with MATLAB. Interdisciplinary Mathemati- cal Sciences 6. Singapore:World Scientific Publishers.

Iske, A. (2004). Multiresolution Methods in Scattered Data Modelling. Berlin: Springer-Verlag.

Schaback, R. (1999). Native Hilbert spaces for radial basis functions I, in New Developments in Approxi- mation Theory, M.W.Muller, M.D.Buhmann, D.H.Mache and M.Felten (eds.), Birkhauser (Basel), 255--282.

Schaback, R. (2000). A unified theory of radial basis functions. Native Hilbert spaces for radial basis functions II, J. Comput. Appl. Math. 121, pp.165--177.

Schaback, R. & Pazouki, M. (2011). Bases for Kernel-based Spaces, J. Comput. Appl. Math. 236, pp.575--588.

Trefethen, L. N. & Bau,D. (1997). Numerical Linear Algebra. Philadelphia, PA: SIAM.

Wendland, H. (2005). Scattered Data Approximation. Cambridge: Cambridge University Press.