Algorithmic Definitions of Singular Functions
by Daniel Bernstein
A thesis submitted in partial fulfillment of the requirements for Departmental Honors in Mathematics Davidson College May 13, 2013
Advisor: Richard D. Neidinger Second Reader: Irl C. Bivens Contents
1 Sets of Discontinuity and Increasing Functions 1 1.1 Dirichlet-Like Functions ...... 1 1.2 Sets of Discontinuity ...... 3 1.3 Increasing Functions ...... 5
2 Derivative of an Increasing Function 8 2.1 Existence ...... 8 2.2 The Cantor Function ...... 15
3 DeRham’s Function 16 3.1 A Natural Probability Question ...... 17 3.2 Fractal Properties ...... 18 3.3 Algorithmic Definition ...... 19 3.4 Analytic Properties ...... 20
0 3.5 Points at Which fa(x) Exists ...... 27
4 Minkowski’s Question-Mark Function 30 4.1 Algorithmic Definition of ?(x)...... 31 4.2 Notes and Remarks ...... 33 4.3 Mediants, Endpoints and an Algebraic Property ...... 33 4.4 Analytic Properties ...... 36 4.5 Salem’s Definition ...... 37 4.5.1 Continued Fractions ...... 37 4.5.2 Connection Between Convergents and Mediants ...... 40 4.5.3 Agreement with Salem’s Extension ...... 46
5 Conclusion and Ideas For Further Study 47
2 Abstract
A function f on an interval [a, b] is singular if f(a) < f(b), f is increasing (non- decreasing), f is continuous everywhere and f 0(x) = 0 almost everywhere. In this thesis, we focus on the strictly increasing singular functions of DeRham and Minkowski. We propose algorithmic definitions of each of these functions, and use these new defini- tions to provide alternative proofs of known properties about the values and derivatives of these functions. Before that, we give some background on sets of discontinuity for general functions, for increasing functions, and some background on the derivatives of increasing functions. 1 Sets of Discontinuity and Increasing Functions
In early calculus courses, most of the functions we encounter are continuous at every point in their domain. We sometimes encounter functions that have at most a finite number of discontinuities on any finite interval but this is usually as discontinuous as it gets. One might wonder if it is possible to define a function on a finite interval that is discontinuous at every point in its domain. More generally, one might wonder about what restrictions exist on the set of points at which a function is discontinuous.
1.1 Dirichlet-Like Functions
Consider the function D : (0, 1) → (0, 1) defined as follows:
1 if x ∈ (0, 1) ∩ Q, D(x) = 0 if x ∈ (0, 1) \ Q.
Dirichlet proposed this function in 1829 and it was groundbreaking because hitherto, people generally thought about defining functions only using analytic expressions. It gives us an example of a function defined on an interval that is discontinuous at every point in its domain. We can extend the main idea behind Dirichlet’s function to create other functions that are discontinuous on unusual sets. Now, we give an example of a function E : (0, 1) → (0, 1) that is continuous at exactly one point: 1 x − 2 if x ∈ (0, 1) ∩ Q E(x) = 1 2 − x if x ∈ (0, 1) \ Q
Figure 1 shows an approximate graph of E.
1 This function is continuous at the point x = 2 and discontinuous everywhere else. Thomae used the idea of Dirichlet to define the function T : (0, 1) → (0, 1) which is
1 Figure 1: Graph of E continuous at every irrational in (0, 1) and discontinuous at every rational in (0, 1). Figure 2 shows the graph of T .
1 d if x ∈ Q and has denominator dinreducedform T (x) = 0 if x is irrational
Graph of T . Source: http://en.wikipedia.org/wiki/File:Thomae function (0,1).svg
From looking at its graph, it isn’t too surprising that T it is discontinuous at each rational. It is less obvious that T is continuous at each irrational, so we will prove that
1 formally. Let x ∈ (0, 1) \ Q. Let ε > 0. Let q ∈ N be such that q < ε. For each i ∈ {2, 3, ..., q} let p δi = min {|x − |}. p∈{1,2,...,i−1} i
Let δ = min{δi}. Let c ∈ (x − δ, x + δ). We will now show that |f(x) − f(c)| < ε,
2 completing the proof that f is continuous at x. Since x is irrational, f(x) = 0 and since f is nonnegative, we have |f(x)−f(c)| = |f(c)| = f(c). So now if we show f(c) < ε then we are done. If c is irrational, f(c) = 0 so we would be done. So assume c is rational.
n Assume c = d in reduced form. Note that d > q because we defined δ such that the interval (x − δ, x + δ) contains no fractions with denominator less than or equal to q. So
1 1 f(c) = d < q < ε. One might wonder, can a function be discontinuous on any arbitrary set? As we will see, the answer is no.
1.2 Sets of Discontinuity
Definition. Let f be some function defined on R. We define the set of discontinuity of f, Df , to be the set of all points at which f is discontinuous.
Definition. We say that a set S is a Fσ set if S can be expressed as a countable union of closed sets.
The goal of this section is to show that for every function f, the set Df is an Fσ set. The structure of this section is taken from some of the exercises in [1]. First we need some more definitions.
Definition. Let f be a function defined on R and let α > 0. We say that f is α- continuous at x if there exists some δ > 0 such that for all y, z ∈ (x − δ, x + δ), |f(y) − f(z)| < α.
It follows directly from the definition of continuity that a function f is continuous at x if and only if f is α-continuous at x for each α > 0.
Definition. Let f be some function defined on R. Let α > 0. We define the set of
α-discontinuity of f, Df (α), to be the set of all points at which f is α-discontinuous.
Lemma 1. Let f be a function defined on R, let α > 0. Then Df (α) is closed.
3 Proof. Let x be a limit point of Df (α). We will show that f is not α-continuous at x.
Let {xn} be a sequence of points in Df (α) such that xn → x as n → ∞. Let δ > 0. We want to find y, z ∈ (x − δ, x + δ) such that |f(y) − f(z)| ≥ α. Let n be such that ˆ |x − xn| < δ and let δ = δ − |x − xn|. Since f is not α-continuous at xn, then there exist ˆ ˆ ˆ ˆ y, z ∈ (x−δ, x+δ) such that |f(y)−f(z)| ≥ α. Note that (xn −δ, xn +δ) ⊂ (x−δ, x+δ) and thus y, z ∈ (x − δ, x + δ), so f is not α-continuous at x.
Lemma 2. Let f be a function defined on R. Let 0 < α1 < α2. Then Df (α2) ⊆ Df (α1).
Proof. Let x ∈ Df (α2). So f is not α2-continuous at x. So for all δ > 0, there exist y, z ∈ (x − δ, x + δ) such that |f(y) − f(z)| ≥ α2 > α1. So f is also not α1-continuous at x.
Theorem 3. Let f be a function defined on R. Then Df is an Fσ set.
Proof. Since each set Df (α) is closed, we are done if we show
[ Df = Df (q). q∈Q+
Let x ∈ Df . Then there exists some α > 0 such that f is not α-continuous at x,
+ because otherwise f would be continuous at x. So x ∈ Df (α). Let q ∈ Q such that q < α. By Lemma 2, Df (α) ⊆ Df (q) and thus x ∈ Df (q). Now let x ∈ Df (q) for some q ∈ Q+. Since f is not q-continuous at x, we have that f is not continuous at x and thus x ∈ Df .
An example of a set that is not Fσ is the set of all irrational numbers in an interval. So there cannot exist a function that is discontinuous only at each irrational in a given interval. The proof of this is outside the scope of this paper.
4 1.3 Increasing Functions
We say that a function f is increasing if x < y implies f(x) ≤ f(y). We say that a function f is strictly increasing if x < y implies f(x) < f(y). Note that all strictly increasing functions are increasing. As we will see, the criteria for the set on which an increasing function is discontinuous are more restrictive then that for a general function.
Theorem 4. Let f : (0, 1) → R be increasing. Then for each c ∈ (0, 1), limx→c+ f(x)
− and limx→c f(x) exist.
Proof. Let c ∈ (0, 1). Note that since f is increasing, f is bounded below on (c, 1) by f(c). Therefore, by the completeness of the reals, f has a greatest lower bound on (c, 1) so we can define L = inf f(x). x∈(c,1)
Now let ε > 0. Since L is a greatest lower bound, there exists some y ∈ (c, 1) such that f(y) < L + ε. Let δ = y − c; note that y > c so δ > 0 and y = c + δ. This gives us
f(c + δ) < L + ε.
Since f is increasing, for any x ∈ (c, c + δ), we have
f(x) ≤ f(c + δ) < L + ε and thus f(x) − L < ε.
Since L is a lower bound for f on (c, 1), L ≤ f(x) and thus
|f(x) − L| = f(x) − L < ε,
thus showing that limx→c+ f(x) exists and is equal to L. Showing the existence of
5 limx→c− f(x) is similar.
Definition. Given a function f : D → R where D ⊆ R, we say that f has a jump discontinuity at x if limx→c+ f(x) and limx→c− f(x) both exist but are not equal.
Theorem 5. Let f : (0, 1) → R be increasing. Then for all c ∈ (0, 1), either f is continuous at c or f has a jump discontinuity at c.
Proof. Let c ∈ (0, 1). The limit limx→c f(x) either exists or doesn’t exist.
Assume that limx→c f(x) exists. We will now show that f(c) = limx→c f(x). As- sume for the sake of contradiction that f(c) 6= limx→c f(x). Then f(c) < limx→c f(x) or f(c) > limx→c. Assume f(c) < limx→c f(x). Then f(c) < limx→c− f(x). Then there exists some x < c such that f(x) > f(c) which is a contradiction because f is increas- ing. Assume f(c) > limx→c f(x). Then f(c) > limx→c+ f(x). Then there exists some x > c such that f(x) < f(x) which is a contradition because f is increasing. Thus if limx→c f(x) exists, it must equal f(c) so f is continuous at c.
Now assume that limx→c f(x) does not exist. By Theorem 4, limx→c+ f(x) and limx→c− f(x) both exist so we must have limx→c+ f(x) 6= limx→c− f(x) for limx→c f(x) to not exist. So f has a jump discontinuity at c.
Theorem 6. Let f : (0, 1) → R be increasing. Let Df denote the set of points at which f is discontinuous. Then Df is countable.
Proof. Informally, we prove this by choosing a rational number in the interval defined by each jump discontinuity. Since our function is increasing, no two such intervals will overlap, so each rational number chosen in this way will be unique. More formally, we will construct a function g : Df → Q that is one-to-one.
Let c ∈ Df . Then limx→c+ f(x) and limx→c− f(x) both exist with limx→c− f(x) < limx→c+ f(x). Let qc ∈ Q ∩ (limx→c− f(x), limx→c+ f(x)) and define g(c) = qc.
To see that g is one-to-one, let c, d ∈ Df such that g(c) = g(d). Assume for the sake of contradiction that c 6= d and without loss of generality, assume c < d. Since
6 g(c) ∈ (limx→c− f(x), limx→c+ f(x)) and g(d) ∈ (limx→d− f(x), limx→d+ f(x)), we must have lim f(x) > g(x) = g(d) > lim f(x). (1) x→c+ x→d−
Let {cn} be a decreasing sequence such that cn → c. Let {dn} be an increasing sequence such that dn → d. Since c < d and {cn} is decreasing and {dn} is increasing, there must exist some N such that for all n ≥ N, cn < dn. Since f is monotone, then for each n ≥ N, f(cn) < g(dn) implying that limx→c+ f(x) ≤ limx→d− f(x) which contradicts (1).
Can any countable set be the set of discontinuity for a monotone function? As it turns out, yes. We address this in the theorem below.
Theorem 7. Let D = {x1, x2, ...} ⊂ (0, 1). Then there exists a monotone function f : (0, 1) → (0, 1) such that f is discontinuous precisely at the points in D.
Proof. We define f as follows: X 1 f(x) = . 2n n:xn X 1 X 1 X 1 X 1 f(y) = = + = f(x) + ≥ f(x) 2n 2n 2n 2n n:xn Now we show that f is discontinuous at each xi by showing that the right and left hand limits of f at xi are unequal for any i. X 1 X 1 X 1 lim f(x) = lim ≥ > = f(xi). + + n n n x→xi x→xi 2 2 2 n:xn 7 X 1 X 1 lim f(x) = lim n ≤ n = f(xi). x→x− x→x− 2 2 i i n:xn So lim + f(x) and lim − f(x) are unequal and thus f is discontinuous at each x→xi x→xi xi. We now give a theorem about strictly increasing functions which will be useful later. Theorem 8. Let f : (0, 1) → (0, 1) be surjective and strictly increasing. Then f is continuous. Proof. Let x ∈ (0, 1). Let ε > 0. Let z1 ∈ (f(x) − ε, f(x)) ∩ (0, 1) and let z2 ∈ (f(x), f(x) + ε)) ∩ (0, 1). Since f is surjective, there exist y1, y2 ∈ (0, 1) such that f(y1) = z1 and f(y2) = z2. Since f is strictly increasing, y1 < x < y2. Let δ = min{x − y1, y2 − x}, note δ > 0. Let c ∈ (x − δ, x + δ). Assume c < x. Since f is increasing and since y1 ≤ x − δ, we have z1 ≤ f(x − δ) ≤ f(c) ≤ f(x) and thus f(x) − ε < c < f(x) + ε. The case where c > x is similar. We now consider properties of the derivatives of increasing functions. 2 Derivative of an Increasing Function In this section we will prove a major theorem about existence properties of the derivative of an increasing function. Then, we will briefly introduce The Cantor Function (for a more complete discussion of The Cantor Function, see [5]). 2.1 Existence Earlier we saw that it is possible to construct a function that is discontinuous everywhere, but if we want the function to be monotone, it can be discontinuous only at a countable set. Since differentiability implies continuity, it follows that we can construct a function 8 that is not differentiable anywhere. If we restrict our attention to continuous functions, this is still possible (see [1, p.146] for an example of a continuous, nowhere differentiable function). What if we restrict our attention to monotone functions? Can we construct a monotone function that is not differentiable anywhere? As we will see, such a function cannot exist - we will prove that the derivative of a monotone function must exist almost everywhere. But first, we need some preliminary results. ∞ Theorem 9 (Fatou’s Lemma). If {fn}n=1 is a sequence of nonnegative measurable func- tions and fn(x) → f(x) for almost all x in some set E, then Z Z f ≤ lim inf fn E E For a proof of Fatou’s Lemma, see [9]. Definition. Let I be an interval and let E be a set. Then l(I) denotes the length of I and m(E) denotes the Lebesgue measure of E. Definition. Let E ⊆ R. Let I be a collection of intervals of positive length that covers E. Then I covers E in the sense of Vitali if for each ε > 0 and for each x ∈ E, there exists an interval I ∈ I such that x ∈ I and l(I) < ε. We may also say that I is a Vitali covering of E. Lemma 10. Let E be a set of finite Lebesgue measure and let I be a set of closed intervals of positive length that covers E in the sense of Vitali. Then, given ε > 0, there exists a finite disjoint collection of intervals {I1,...,IN } such that N ! [ m E \ In < ε. n=1 Proof. Let O be an open set of finite measure that contains E. We first define the set I ∗ ⊆ I where each I ∈ I ∗ is contained in O, then we note that I ∗ is also a Vitali covering of E. To see this, we let ε > 0 and let x ∈ E. Since O is open, there exists a 9 δ > 0 such that (x − δ, x + δ) ⊂ O. Letε ˆ = min{ε, δ}. Since I is a Vitali covering of E, there exists some I ∈ I such that x ∈ I and l(I) < εˆ. Then, I is contained entirely within O so I ∈ I ∗. Since l(I) < ε, I ∗ is a Vitali covering of E. So, now we will assume without loss of generality that each I ∈ I is contained entirely within O. Now, we inductively create a sequence {In} of disjoint intervals in I . Let I1 be any Sn interval in I . Supposing that I1,...In have all been chosen. If E ⊆ k=1 Ik, then for SN any ε > 0, we have m E \ n=1 In < ε so we would be done. So, assume this is not the case. Now we show how to choose In+1. Let kn be the supremum of the lengths of intervals in I that do not intersect any of the intervals I1,...,In. Since each element of I is contained in O, we have kn ≤ m(O) < ∞. Then, we choose an interval In+1 ∈ I 1 such that In+1 is disjoint from all of I1,...,In and l(In+1) > 2 kn. We can choose In+1 to be disjoint from the finite collection I1,...,In because I is a Vitali covering, so In+1 1 can be arbitrarily small. We can choose In+1 to have length greater than 2 kn because kn is the supremum of the lengths of intervals that are disjoint from I1,...,In. S∞ Thus, we have an infinite sequence {In} of disjoint intervals of I . Since n=1 In ⊆ P∞ O, we have n=1 l(In) ≤ m(O) < ∞. This implies two things. First, it implies limn→∞ l(In) = 0. Otherwise there would exist some B > 0 such that for some positive P∞ integer k, for each n ≥ k, we would have l(In) ≥ B which would give n=1 In = ∞. Second, this implies that for any ε > 0, we can choose a positive integer N such that ∞ X l(In) < ε/5. n=N+1 Now, let N [ R = E \ In. n=1 If we show that m(R) < ε, then we will be done. SN SN Let x ∈ R. Note that R does not intersect n=1 In. Then, since n=1 In is the union of finitely many closed intervals and thus itself closed, we can find a δ > 0 such that 10 SN (x − δ, x + δ) does not intersect n=1 In. Then, since I is a Vitali covering, there exists SN an I ∈ I such that x ∈ I ⊂ (x − δ, x + δ). So I also does not intersect n=1 In. Now let n be a positive integer such that for all i ≤ n, I does not intersect Ii. Then, since kn is the supremum of the lengths of the intervals that do not intersect any of 1 I1,...,In, l(I) ≤ kn. By construction, l(In+1) > 2 kn which gives l(I) ≤ kn < 2l(In+1). Now, In+1 may intersect I, and for n large enough, it will because limi→∞ l(Ii) = 0 and each interval in I has positive length. So fix n to be the smallest integer such that I intersects In+1. Note that we must have n ≥ N and l(I) < kn < 2l(In+1). Since x ∈ I and since I has a point in common with In+1, the distance from x to the midpoint of 1 5 In+1 is at most l(I) + 2 l(In+1) ≤ 2 l(In). For each i, define Ji to be the interval that shares a midpoint with Ii that is five times the length. Then x must be contained in the interval Jn+1. Since x is an arbitrary point in R, we have that ∞ [ R ⊆ Ji. i=N+1 Hence ∞ ∞ X X m(R) ≤ l(Ji) = 5 l(Ii) < ε i=N+1 i=N+1 Definition. Let f be a function on the real numbers. The following four quantities are called the derivates of f at x: f(x + h) − f(x) f(x + h) − f(x) D+f(x) = lim ,D−f(x) = lim , h→0+ h h→0− h f(x + h) − f(x) f(x + h) − f(x) D+f(x) = lim ,D−f(x) = lim . h→0+ h h→0− h + By definition of limit superior and limit inferior, we have D f(x) ≥ D+f(x) and − + D f(x) ≥ D−f(x). The function f is differentiable at x if and only if D f(x) = 11 − D+f(x) = D f(x) = D−f(x) 6= ∞, and the derivative of f at x is the common value. Theorem 11 (Lebesgue’s Theorem). Let f :[a, b] → R be an increasing function. Then f is differentiable almost everywhere. Proof. We prove this theorem by showing that the subset of [a, b] whose elements have unequal derivates is of measure zero. We consider only the set E where for all x ∈ E, + D f(x) > D−f(x). The cases where other derivates are unequal is similarly handled. Now, let u, v be rational numbers with u > v and define + Eu,v = {x : D f(x) > u > v > D−f(x)}. Then we have the following identity: [ E = Eu,v. u,v∈Q,u>v This union is countable so if we show that an arbitrary Eu,v is of measure zero, then it follows that E is of measure zero and we are done. If Eu,v is empty, then we are done so assume that Eu,v is nonempty. Let s = m(Eu,v) and let ε > 0. Let O be an open set that contains Eu,v such that m(O) < s + ε. For an x ∈ Eu,v, we can choose an arbitrarily small h > 0 such that [x − h, x] is contained in O and so that f(x) − f(x − h) < v. (2) h For each x ∈ Eu,v, define Ix to be the collection of all intervals [x − h, x] that are contained in O and satisfy (2) and let = S . Then is a Vitali covering of I x∈Eu,v Ix I Eu,v so by Lemma 10, we can choose a finite, disjoint collection {I1,...,IN } of intervals in I such that N [ m(Eu,v \ In) < ε. n=1 12 For each n = 1,...,N let xn and hn be the x and h that were used to create In. Then for each n = 1,...,N, (2) gives f(xn) − f(xn − hn) < vhn. We can sum all of these to get N N X X f(xn) − f(xn − hn) < v hn. (3) n=1 n=1 Note that hn is the width of the interval In and that In’s are disjoint and contained PN within O which gives us n=1 hn ≤ m(O) < s + ε. Plugging this into (3) gives N X f(xn) − f(xn − hn) < v(s + ε). (4) n=1 SN Let An = Int(In) ∩ Eu,v and let A = n=1 An. Then, m(A) > s − ε. Since each point in A is in the interior of some In, then for each y ∈ A, we can choose an arbitrarily small k such that [y, y + k] is contained in some In; and since each y in A is also in Eu,v, + f(y+k)−f(y) we have D f(x) > u, so we can choose an arbitrarily small k such that k > u. Now, for each y ∈ A, let Jy be the collection of intervals [y, y + k] with k small enough S to meet both of the above conditions. Let J = y∈A Jy. Then J is a Vitali covering of A, so by Lemma 10, we can choose a finite disjoint collection {J1,...,JM } of intervals in J where Ji = [yi, yi + ki] such that M [ m(A \ Ji) < ε. i=1 Then we also have M ! M [ X m Ji = ki > m(A) − ε = s − 2ε. (5) i=1 i=1 13 Recall that by construction of the ks, we have f(yi+ki)−f(yi) > u and thus ki M M X X f(yi + ki) − f(yi) > u ki. (6) i=1 i=1 Then (5) and (6) give M X f(yi + ki) − f(yi) > (s − 2ε)u. (7) i=1 By construction, each interval Ji is contained within some In. Then for each i, let n be such that Ji ⊂ In. Since f is increasing, the following inequality holds: f(xn − hn) ≤ f(yi) ≤ f(yi + ki) ≤ f(xn). We can use this to get f(xn) − f(xn − hn) ≥ f(yi + ki) − f(yi). Note that each Ji is contained inside exactly one In. Now fix an n and consider all i such that Ji ∈ In. Since f is increasing and since the Jis are disjoint, we have X f(xn) − f(xn − hn) ≥ f(yi + ki) − f(yi). Ji⊂In Thus N M X X f(xn) − f(xn − hn) ≥ f(yi + ki) − f(yi). n=1 i=1 Applying (4) and (7) gives v(s + ε) > u(s − 2ε). Since this is true for each ε > 0, we must have vs ≥ us. But since u > v, then vs ≥ us can only be true if s = 0. 14 This shows that the function f(x + h) − f(x) g(x) = lim h→0 h is defined almost everywhere and that f is differentiable when g is finite. We now show that g must be finite almost everywhere. For each n ∈ N and each x ∈ [a, b], we define 1 g (x) = n f x + − f(x) n n and for each x > b, g(x) = f(b). Then, gn(x) → g(x) as n → ∞ for almost all x. So g is measurable. Since f is increasing, gn(x) ≥ 0 wherever gn exists. Then, by Theorem 9, Z Z Z b 1 g ≤ lim inf gn = lim inf n f x + − f(x) dx [a,b] [a,b] a n " Z Z # = lim inf n f − n f 1 1 [b,b+ n ] [a,a+ n ] " Z # = lim inf f(b) − n f 1 [a,a+ n ] ≤ f(b) − f(a). So g is integrable and thus finite almost everywhere. So f is differentiable almost everywhere. 2.2 The Cantor Function One of the first facts we learn about the derivative of a real valued function f is that for any point x in the domain of f, if f 0(x) exists, then the value of f 0(x) is the slope 15 of the line that is tangent to the graph of f at the point x. Thus it seems natural to assume that if a function f :[a, b] → R is increasing and f(a) < f(b), then f 0(x) must be positive somewhere between a and b. However, this is not true. The Cantor Function, whose graph is shown in Figure 2, is a counterexample to this intuition. Figure 2: The Cantor Function This function is interesting because it is continuous and its derivative is zero almost everywhere, yet its graph climbs from 0 to 1. It satisfies the following definition: Definition 12. A continuous, increasing function f :[a, b] → R that satisfies f(a) < f(b) and f 0(x) = 0 for almost all x ∈ [0, 1] is said to be singular. From looking at the graph of The Cantor Function, it isn’t that surprising that it is singular; the graph is horizontal in almost all places. This function is almost always constant. One might wonder, are there any singular functions that are never constant? Equivalently, are there any strictly increasing singular functions? The answer to this question is yes. We will see two such examples. 3 DeRham’s Function Our first example of a strictly increasing singular function is DeRham’s function. There are multiple ways that this function is usually defined. We discuss two of them and then 16 Figure 3: Graph of fˆ1 (x) 3 provide our own definition. We then use our new definition to provide alternative proofs of some known results. 3.1 A Natural Probability Question Say we have a fair coin that we flip an infinite number of times. We will write down a 0 each time the coin lands on tails and a 1 each time it lands on heads. After generating this infinite string of zeros and ones, we can append a decimal point to the front and interpret the result as a number between 0 and 1 expressed in binary. Call this number R. What does the cumulative distribution function for R look like? We will denote this 1 f(x). So f(x) is the probability that R < x. If R < 2k for some positive integer k, 1 1 1 then our first k flips must be tails which occurs with probability 2k . Thus, f( 2k ) = 2k . In fact, we can show that for any x ∈ [0, 1], we have f(x) = x. So in this case, the cumulative distribution function of R is not particularly interesting. However, if we 1 assume that our coin is unfair, landing on tails with probability a 6= 2 , and heads with probability (1 − a), then the cumulative distribution function of R, which we will denote ˆ fa(x), takes on some more interesting characteristics. Figures 3 and 4 show the graphs of fˆ1 (x) and fˆ2 (x). Note that these functions are not inverses of one another, although 3 3 they may appear to be at first glance. 17 Figure 4: Graph of fˆ2 (x) 3 3.2 Fractal Properties 1 Note that each part of the curve on either side of the line x = 2 is a shrunken version of the whole curve. More specifically, if we shrink the whole curve to half its size in the x direction and a times its size in the y direction, we get the left half of the curve and if we shrink the whole curve to half its size in the x direction and 1−a times its size in the y direction, we get the right half of the curve. We state this property more rigorously with the following theorem. ˆ Theorem 13. DeRham’s function fa(x) satisfies the following: afˆ (2x) if 0 ≤ x ≤ 1 ˆ a 2 fa(x) = . ˆ 1 (1 − a)fa(2x − 1) + a if 2 < x ≤ 1 Proof. Let x = 0.b1b2b3 ... be a binary representation of x that does not terminate. ˆ Recall that fa(x) = P (R < x) and let R = 0.r1r2r3 ... be the binary representation of R. 1 1 Now assume 0 ≤ x ≤ 2 . Then b1 = 0 (assuming that if x = 2 , then it is expressed as 0.0111 ...). Then, R < x if and only if r1 = 0 and 0.r2r3r4 . . . < 0.b2b3b4 .... We have r1 = 0 with probability a. Since x1 = 0, 2x = 0.b2b3b4 .... Thus 0.r2r3r4 . . . < 0.b2b3b4 ... 18 ˆ 1 ˆ ˆ with probability fa(2x). So if 0 ≤ x ≤ 2 , then fa(x) = afa(2x). 1 Now assume 2 < x ≤ 1. Then b1 = 1. So if r1 = 1, then we must have 0.r2r3r4 . . . < 0.b2b3b4 .... Also, 2x − 1 = 0.b2b3b4 .... So 0.r2r3r4 . . . < 0.b2b3b4 ... ˆ is true with probability fa(2x − 1). If r1 = 0, then the values of r2, r3, r4,... are unre- 1 stricted. Since r1 = 1 with probability 1 − a and r1 = 0 with probability a, if 2 < x ≤ 1, ˆ ˆ then fa(x) = (1 − a)fa(2x − 1) + a. Not only does DeRham’s function satisfy the formula given in Theorem 13, it is the unique continuous solution to that functional equation. See [3] for more details. ˆ 1 DeRham uses this definition to prove that fa is singular for all a 6= 2 . In [6], Kawamura provides an alternate definition and uses it to explicitly identify sets of real ˆ0 ˆ0 numbers on which fa(x) = 0 and the sets on which fa(x) does not exist. 3.3 Algorithmic Definition ˆ We now define a function fa(x) with an algorithm. We will later show that fa(x) = fa(x) for all x. Then we use this new definition to get alternative proofs for the aforementioned results of DeRham and Kawamura. Algorithm 1 Given input x ∈ [0, 1], generate fa(x) Suppose x has binary representation 0.d1d2d3 ... L1 ← 0 U1 ← 1 M0 ← 1 for n = 1, 2, 3,... do Mn ← (1 − a)Ln + aUn if dn = 1 then Ln+1 ← Mn Un+1 ← Un else Un+1 ← Mn Ln+1 ← Ln end if end for All three of the sequences defined converge to a common value. This value is fa(x). 19 ∞ We can think of this algorithm as defining a sequence of closed intervals {[Ln,Un]}n=1. Note that if dk = 0, then the length of [Lk,Uk] is a times the size of [Lk−1,Uk−1], and if dk = 1, then the length of [Lk,Uk] is 1 − a times the size of [Lk−1,Uk−1]. From this, we can see that the lengths of the intervals tends to 0 as n → ∞, and since the intervals are closed, then there is exactly one point contained in all of them. This justifies our claim at the end of the algorithm. It is also worth noting that after k iterations of the algorithm, the largest value that fa(x) could take is Uk, which occurs when all the digits after the kth are 1. Similarly, the smallest value that fa(x) could take is Lk, which occurs when all the digits after the kth are 0. Each dyadic rational has two distinct binary representations. One such representa- tion ends in an infinite string of zeros, while the other ends in an infinite string of ones. For any dyadic x, we can assume either binary representation and our algorithm will generate the same value of fa(x). In the case of infinite zeros, Lk will eventually be fixed at fa(x), while Uk converges down to fa(x). In the case of infinite ones, Uk will become fixed at fa(x), while Lk converges up to fa(x). In the next section, we prove ˆ that fa(x) = fa(x) - that our algorithmic definition indeed defines DeRham’s function. Before we get to that, we prove some other results about fa(x). 3.4 Analytic Properties Now we will use our algorithmic definition of fa(x) to show that it is strictly increasing and singular. First we show that it is strictly increasing. Theorem 14. For a ∈ (0, 1), the function fa(x) is strictly increasing. Proof. Let x, y ∈ [0, 1] such that x < y. Consider the binary expansions of x and y and if either is dyadic, assume it is expressed in the terminating way. Since x < y, then there exists some k such that x has a 0 in the kth digit and y has a 1. Let i be th the smallest such k. Then, if Lix,Uix and Liy,Uiy are the i endpoints for x and y 20 respectively, then Uix = Liy. We already know that fa(y) ≥ Liy. Also, fa(x) ≤ Uix. th However, if fa(x) = Uix, then that would require x to have only ones from the i point on. This would make x a dyadic rational, but expressed in the non-terminating way, and we assumed that x would be terminating if dyadic. So fa(x) < Uix = Liy ≤ fa(y). Now we show that fa(x) is singular. We start by showing continuity. Theorem 15. For a ∈ (0, 1), the function fa(x) is continuous. Proof. We claim that fa(x) is surjective. Let y ∈ [0, 1]. We now generate an infinite ∞ ∞ string b1b2b3 ... of zeros and ones. We now define sequences {wn}n=1, {un}n=1 and ∞ {mn}n=1 as follows. Let w1 = 0 and u1 = 1. For each n ∈ N, let mn = awn + (1 − a)un. If mn ≥ y, then let un+1 = wn, let wn+1 = wn and let bn = 1 Otherwise, if mn < y, then let un+1 = un, let wn+1 = mn and let bn = 0 Now we can interpret 0.b1b2b3 ... as a number in [0, 1] expressed in binary. Let x take this value. Then f(x) = y. Since fa(x) is surjective and strictly increasing, Theorem 8 implies that fa(x) is continuous. 0 Before we prove that fa(x) = 0 when it exists, we need to show that for any function f, we can approximate f 0(x) with secant lines. We do this with two lemmas. ∞ ∞ Lemma 16. Let {xn}n=1 and {yn}n=1 and x be such that xn → x and yn → x as n → ∞. ∞ Let {rn}n=1 be a sequence such that 0 ≤ rn ≤ 1 for all n. Then rnxn + (1 − rn)xn → x as n → ∞. Proof. Let ε > 0. Let N be such that for all n ≥ N, |xn − x| < ε and |yn − x| < ε. 21 Then, |rnxn + (1 − rn)yn − x| = |rn(xn − x) + (1 − rn)(yn − x)| ≤ rn|xn − x| + (1 − rn)|yn − x| < rnε + (1 − rn)ε = ε. ∞ ∞ Lemma 17. Let f be a function that is differentiable at x. Let {an}n=1 and {bn}n=1 be such that an → x and bn → x as n → ∞ where an ≤ x ≤ bn with an 6= x or bn 6= x for all n. Then f(b ) − f(a ) n n → f 0(x) bn − an as n → ∞. Proof. We start with the following algebraic manipulation: f(b ) − f(a ) n n = bn − an f(b ) − f(x) + f(x) − f(a ) n n = bn − an b − x f(b ) − f(x) x − a f(x) − f(a ) n n + n n . bn − an bn − x bn − an x − an Then note that f(bn)−f(x) → f 0(x) and f(x)−f(an) → f 0(x) as n → ∞. Also note that bn−x x−an since a ≤ x ≤ b for all n, 0 ≤ bn−x ≤ 1 for all n. Since x−an = 1 − bn−x , we can n n bn−an bn−an bn−an apply Lemma 16 to get b − x f(b ) − f(x) x − a f(x) − f(a ) n n + n n → f 0(x) bn − an bn − x bn − an x − an as n → ∞. 22 1 0 Theorem 18. For a 6= 2 and 0 ≤ x ≤ 1, we have fa(x) = 0 if it exists. Proof. Let x ∈ [0, 1] and let 0.b1b2b3 ... be the binary expansion of x. Define − xn = 0.b1b2 . . . bn000 ..., + xn = 0.b1b2 . . . bn111 .... Now, note that for any n we have the following: − Ln = fa(xn ) (8) + Un = fa(xn ) (9) − + xn ≤ x ≤ xn (10) 1n x+ − x− = . (11) n n 2 We can use the above to approximate the derivative of fa at x with a secant line. We define + − fa(xn ) − fa(xn ) dn(x) = + − . xn − xn The value dn(x) is the slope of a secant line through the graph of fa near x. So by 0 0 Lemma 17, if fa(x) exists, then dn(x) → fa(x) as n → ∞. Applying (8),(9) and (11) to our definition of dn(x) gives Un − Ln dn(x) = 1 n . 2 Now, for each n, define kn(x) = kn = is the number of 1s in the first n binary digits of x. As we noted earlier, if dk = 0, then the length of [Lk,Uk] is a times the size of [Lk−1,Uk−1] and if dk = 1, then the length of [Lk,Uk] is 1−a times the size of [Lk−1,Uk−1]. 23 Therefore, since the length of [L1,U1] = [0, 1] is 1, the length of [Ln,Un] must be (1 − a)kn an−kn . So + − kn n−kn fa(xn ) − fa(xn ) = Un − Ln = (1 − a) a , (12) which gives kn n−kn kn n−kn (1 − a) a (1 − a) a kn n−kn dn(x) = n = = (2 − 2a) (2a) . (13) 1 1 kn 1 n−kn 2 2 2 Now that we have a concrete expression for the value of each dn(x), we can start to ∞ figure out the limit of the sequence {dn(x)}n=1. Recall that this is of interest because if 0 fa(x) exists, then its value must equal the limit of the sequence in (13). Now assume that dn(x) → c as n → ∞ for some real constant c. If c 6= 0, then we must have dn+1(x) → c = 1. We will now show that the sequence { dn+1(x) }∞ cannot dn(x) c dn(x) n=1 ∞ converge to 1. From this, it will follow that the sequence {dn(x)}n=1 either does not converge, or that it converges to 0. Thus the derivative of fa(x) must equal 0 wherever it exists. So now we will show that dn+1(x) does not tend to 1. Note that dn(x) d (x) (2 − 2a)kn+1 (2a)n+1−kn+1 n+1 kn+1−kn 1+kn−kn+1 = k n−k = (2 − 2a) (2a) . dn(x) (2 − 2a) n (2a) n Since ki denotes the number of 1s in the first i digits of x, we must have either kn+1 = kn or k = k + 1. The first case gives dn+1(x) = 2a and the second case gives dn+1(x) = n+1 n dn(x) dn(x) 2(1 − a). So each term in our sequence is either 2a or 2(a − 1). So if our sequence is to converge, it must converge to 2a or to 2(1 − a). Neither one of these values is 1 since 1 a 6= 2 . So if the derivative of fa exists at a point x, it must be 0. 1 0 Corollary 19. For a 6= 2 , fa(x) = 0 almost everywhere. Proof. We get this from the previous theorem and Lebesgue’s theorem (Theorems 18 and 11). Now we show that our definition of DeRham’s function is equivalent to the probability 24 definition. Theorem 20. For all x, fa(x) = P (R < x). + + Proof. First, we show by induction that for all x and for all n, fa(xn ) = P (R < xn ) and − + + − fa(xn ) = P (R < xn ). For the base case, note that for all x, we have x0 = 1 and x0 = 0 and P (R < 1) = 1 = fa(1) and P (R < 0) = 0 = fa(0). For the inductive step, assume + + − − th that fa(xn ) = P (R < xn ) and fa(xn ) = P (R < xn ). If bn+1 = 0 (the (n + 1) digit of − − x), then xn = xn+1 so we’re done by the induction hypothesis. If bn+1 = 1, then − fa(xn+1) = (1 − a)Ln + aUn = Ln + a(Un − Ln) = − + − − − + fa(xn ) + a(fa(xn ) − fa(xn )) = P (R < xn ) + aP (xn ≤ R < xn ). − − + th This represents the probability that R < xn or xn < R < xn and the (n + 1) digit − of R is 0. Then we are done because R < xn+1 if and only if these conditions are met. + Showing the inductive step for f(xn+1) is similar. − If x is a dyadic rational, then x = xn for some n so we are done. Otherwise, if x is − + not a dyadic rational, then for all n, we have xn < x < xn . Since fa(x) is increasing, it − + − follows that fa(xn ) < fa(x) < fa(xn ). Applying the above gives P (R < xn ) < fa(x) < + − + − + P (R < xn ). Since xn < x < xn , we also have P (R < xn ) < P (R < x) < P (R < xn ). − + As stated earlier, the sequences defined by P (R < xn ) and P (R < xn ) both converge to the same real number. Since the above inequalities are true for all n, this means that P (R < x) = fa(x). 0 It is natural to wonder where fa(x) exists and where it does not exist. We start off by showing that this function is not differentiable at the dyadic rationals. 0 Theorem 21. Let x ∈ [0, 1] be a dyadic rational. Then fa(x) does not exist. Proof. Since x is a dyadic rational, it has a binary representation that ends with an infinite string of 0s. Let 0.b1b2 . . . bi000 ... be this representation. Recall the definition 25 0 0 of dn(x) from the proof of Theorem 18 and recall that if fa(x) exists, then dn(x) → fa(x). Also kn n−kn dn(x) = (2 − 2a) (2a) . Now, assume that n ≥ i. Recall that we are working with the binary representation of th x such that each binary digit of x beyond the i digit is a 0. This means that kn = ki which gives kn n−kn ki n−ki dn(x) = (2 − 2a) (2a) = (2 − 2a) (2a) = ki i−ki n−i n−i (2 − 2a) (2a) (2a) = di(x)(2a) . 1 So if a > 2 , we have dn(x) → ∞ as n → ∞, implying that the derivative doesn’t 1 exist. Note that if a < 2 then dn(x) → 0 as n → ∞. However, we can define a sequence analogous to dn(x) that is based on the other binary representation of x (the 1 representation that ends in an infinite string of 1s) that will tend to infinity when a < 2 . Let 0.e1e2 . . . ej111 ... be the binary representation of x such that each binary digit th of x beyond the j digit is a 1. We now define dn(x) exactly as before. Assume n ≥ j. This means that j − kj = n − kn which gives kn n−kn dn(x) = (2 − 2a) (2a) = kn−kj kj j−kj n−j (2 − 2a) (2 − 2a) (2a) = (2 − 2a) dj(x). 1 So if 2 − 2a > 1, or equivalently, if a < 2 , then dn(x) → ∞ as n → ∞. It is also 1 interesting to note that for this way of defining dn(x), if a > 2 then dn(x) → 0 as n → ∞. 26 0 3.5 Points at Which fa(x) Exists We will now identify a subset of [0, 1] where fa is differentiable. This will have to do with what we will call the density of zeros and ones in x [6]. Heuristically, the density of zeros in x is the probability that a digit chosen at random from the binary representation of x is a zero. The density of ones in x is similarly defined. Since we already know that fa is not differentiable at any dyadic rational, we will not bother with the dyadic rationals. This means that in defining the density of zeros and ones in x, we will not need to worry about non-unique binary representations. We introduce the following notation Definition 22. Given an x ∈ [0, 1] such that x is not a dyadic rational, we define pn to be the position of the nth zero in the binary representation of x. In addition, we define the two functions n D0(x) = lim , n→∞ pn D1(x) = 1 − D0(x). There do exist numbers for which the densities of 0s and 1s do not exist.