Eingereicht von Simon Breneis
Angefertigt am Institut für Analysis
Beurteiler Univ.-Prof. Dr. Aicke Hin- richs
Mitbetreuung O.Univ.-Prof. Dr.phil. Functions of bounded Dr.h.c. Robert Tichy April 2020 variation in one and multiple dimensions
Masterarbeit zur Erlangung des akademischen Grades Diplom-Ingenieur im Masterstudium Mathematik in den Naturwissenschaften
JOHANNES KEPLER UNIVERSITÄT LINZ Altenbergerstraße 69 4040 Linz, Österreich www.jku.at DVR 0093696 Eidesstattliche Erklärung i
Eidesstattliche Erklärung
Ich erkläre an Eides statt, dass ich die vorliegende Masterarbeit selbstständig und ohne fremde Hilfe verfasst, andere als die angegebenen Quellen und Hilfsmittel nicht benutzt bzw. die wörtlich oder sinngemäß entnommenen Stellen als solche kenntlich gemacht habe. Die vorliegende Masterarbeit ist mit dem elektronisch übermittelten Textdokument identisch.
Ort, Datum Unterschrift Abstract ii
Abstract
In this Master’s thesis, we investigate the properties of functions of bounded variation. First, we consider univariate functions, afterwards we generalize this notion to higher dimensions. There are many different definitions of multivariate functions of bounded variation. We study functions of bounded variation in the senses of Vitali; Hardy and Krause; Arzelà; and Hahn. Many results for those functions of bounded variation were previously only known in the bivariate case. We extend them to arbitrary dimensions, and also add some new results. Contents iii
Eidesstattliche Erklärung i
1 Introduction 1
2 Functions of one variable 4 2.1 Motivation and definition ...... 4 2.2 Variationfunctions ...... 7 2.3 Decomposition into monotone functions ...... 23 2.4 Continuity, differentiability and measurability ...... 24 2.5 SignedBorelmeasures ...... 32 2.6 Dimensionofthegraph ...... 35 2.7 Structure of ...... 40 BV 2.8 Ideal structure of ...... 44 BV 2.9 FourierSeries ...... 47
3 Functions of multiple variables 60 3.1 Definitions...... 60 3.2 Thevariationfunctions ...... 71 3.3 Closureproperties ...... 78 3.4 Decompositions into monotone functions ...... 80 3.5 Inclusions ...... 83 3.6 Continuity, differentiability and measurability ...... 85 3.7 SignedBorelmeasures ...... 96 3.8 Dimensionofthegraph ...... 97 3.9 Productfunctions...... 98 3.10 Structureofthefunctionspaces...... 108 3.11 Ideal structure of the functions spaces ...... 111
4 The Koksma-Hlawka inequality 116 4.1 Harmanvariation...... 117 4.2 -variation ...... 118 D 4.3 Koksma-Hlawka inequality for the Hahn-variation ...... 121 4.4 Otherestimates...... 123
Literature 125 1 Introduction 1
1 Introduction
The goal of this Master’s thesis is to study the properties of functions of bounded variation. We study univariate functions of bounded variation in Section 2 and multivariate functions of bounded variation in Section 3. Finally, in Section 4, we give an application of functions of bounded variation to numerical integration. We define the variation of a univariate function f : [a, b] R by → n Var(f; a, b) := sup f(xi) f(xi 1) : a = x0 x1 xn = b for some n N . − − ≤ ≤···≤ ∈ i=1 X
If the interval [a, b] is clear from the context, we also write Var(f) := Var(f; a, b). If the variation Var(f) is finite, we say that f is of bounded variation. Functions of bounded variation were first introduced by Jordan in [31] in the study of Fourier series. By now, they have many applications, for example in the study of Riemann-Stieltjes integrals. In Section 2.1 we motivate and define functions of bounded variation in one dimension, albeit using different (non-standard) notation. The reason is that the standard notation is quite messy and hard to read in the higher-dimensional setting. Hence, to prepare the reader for Section 3, we already use notation that can be extended more easily to multivariate functions. Furthermore, we give many examples and non-examples of functions of bounded variation. In Section 2.2 we study variation functions. To every function f : [a, b] R of bounded variation, → we can associate its variation function Var : [a, b] R defined by f → Varf (x) := Var(f; a, x). A function and its variation function share many regularity properties. For example, f is continuous if and only if Varf is continuous and f is Lipschitz continuous if and only if Varf is Lipschitz continuous, see also Theorem 2.2.5. It is also easy to see that f is α-Hölder continuous if Varf is α-Hölder continuous. The reverse direction seems to be an open problem. We answer this problem negatively with Example 2.2.6 and prove a more general statement in Theorem 2.2.17. In Section 2.3 we give a proof of the famous result by Jordan that a function is of bounded variation if and only if it can be written as the difference of two increasing functions, see Theorem 2.3.2. Therefore, the set of functions of bounded variation is the vector space induced by the monotone BV functions. Section 2.4 deals with the regularity properties of functions of bounded variation. We show that functions of bounded variation can have at most countably many discontinuities, that they are differentiable almost everywhere and Borel-measurable, see Theorem 2.4.2 and Theorem 2.4.6. Section 2.5 illustrates the connection between functions of bounded variations and measures. Indeed, there is a natural correspondence between right-continuous functions of bounded variation and finite signed Borel measures, see Theorem 2.5.5. It is a well-known result that the graph of functions of bounded variation has Hausdorff-dimension 1. In 2010, Liang proved in [38] that continuous functions of bounded variation also have Box- dimension 1, using Riemann-Liouville fractional integrals. We give a much more elementary proof of this fact in Section 2.6 and show that it is not necessary to require the function to be continuous, see Theorem 2.6.9. Kuller proved [35] that is a commutative Banach algebra with respect to pointwise multiplication. BV We give a proof of this fact in Section 2.7 (Theorem 2.7.8), and we also show Helly’s First Theorem, which tells us that the unit ball of satisfies some weaker form of compactness, see Theorem BV 1 Introduction 2
2.7.11. Consequently, in Section 2.8 we characterize the maximal ideal space of in Theorem BV 2.8.7. Finally, in Section 2.9 we give an application of functions of bounded variation to the study of Fourier series. In particular, we prove some famous results due to Jordan, including that the Fourier series of functions of bounded variation converges pointwise (although not necessarily exactly to the function itself) in Theorem 2.9.14, and that the Fourier series of a continuous function of bounded variation converges uniformly to the function itself in Theorem 2.9.23. The generalization of functions of bounded variation to the multidimensional setting is not imme- diately clear. Indeed, there are many different definitions. We study the variations in the sense of Vitali; Hardy and Krause; Arzela; Hahn; and Pierpont. We denote by , , , and the V HK A H P corresponding sets of functions of bounded variation. In Section 3.1 we define those variations and give examples of functions of bounded variation of the various kinds. Since we show in Theorem 3.5.1 that the variations in the sense of Hahn and Pierpont are equivalent, thus extending a similar two-dimensional result by Clarkson and Adams in [13], we rarely treat the Pierpont-variation in the following chapters. In Section 3.2, we again define the variation functions similarly as in the one-dimensional settings. Unfortunately, many results that hold for univariate functions do not extend to multivariate func- tions. However, in Theorem 3.2.6 we prove some previously unknown regularity correspondences similar to those in Theorem 2.2.5, especially for functions of bounded Arzelà-variation. In Section 3.3, we prove that , , and are vector spaces, and that , and are V HK A H HK A H closed under multiplication and division (given that the denominator is bounded away from 0), see Proposition 3.3.1 and Proposition 3.3.3. Most of those results were already known, although some of them had only been proved for bivariate functions. Similarly to the one-dimensional setting, we state monotone decomposition theorems for functions in , and in Theorem 3.4.1, Theorem 3.4.2 and Theorem 3.4.3, respectively. Naturally, since the A V HK various definitions of bounded variation do not coincide, we use different definitions of monotonicity in those theorems. We remark that those decompositions were already known, although some of them again only in two dimensions. In Section 3.5 we study the relations between the various kinds of bounded variation. We are able to extend some (but not all) previously known results from the two-dimensional setting to arbitrary dimensions in Theorem 3.5.1. Section 3.6 deals with the regularity properties of multivariate functions of bounded variation, where we again extend some results from the two-dimensional setting to arbitrary dimensions. In particular, we show that functions in , and are continuous almost everywhere (Theorem HK A H 3.6.1) and thus also Lebesgue-measurable, that functions in and are differentiable almost HK A everywhere (Theorem 3.6.13), and that functions in are Borel-measurable (Theorem 3.6.18). HK In Section 3.7 we state the correspondence between right-continuous functions in and finite HK signed Borel measures, which is the precise generalization of Theorem 2.5.5 to the multidimensional setting. Verma and Viswanathan proved in 2020 in [49] that the graph bivariate continuous functions of bounded Hahn-variation has Hausdorff- and Box-dimension 2. In Section 3.8, we extend this result to arbitrary dimensions and get rid of the continuity condition. All the higher-dimensional variations we consider are generalizations of the one-dimensional concept. In particular, for univariate functions the notions of variations are all equivalent. We show this already in Proposition 3.1.17. Therefore, one might hope that we can also prove that the variations coincide for product functions, i.e. functions that are the product of one-dimensional functions. 1 Introduction 3
Adams and Clarkson already noted in [1] that such connections exist for bivariate functions, although their statements were a bit imprecise and they offered few proofs. Hence, we study those product functions is Section 3.9, and show in Corollary 3.9.10 that under rather weak conditions all kinds of variations are equivalent for product functions. Blümlinger and Tichy proved in [10] that is a Banach algebra with respect to pointwise multi- HK plication. We show that , and are also Banach algebras in Theorem 3.10.3, Theorem 3.10.8 A H P and Corollary 3.10.12, respectively. Finally, in Section 3.11 we study the maximal ideal space of the Banach algebras and . HK A Blümlinger already characterized the maximal ideal space of in [9], and we aim to do the same HK for . However, it turns out has far more maximal ideals than , which becomes especially A A HK clear in Proposition 3.11.12. Therefore, we were unsuccessful in obtaining a characterization. In Section 4 we study the Koksma-Hlawka inequality. The Koksma-Hlawka inequality bounds the error of approximating the integral of a function f : [0, 1]d R using the quadrature rule → 1 n f(x)dx f(xi) [0,1]d ≈ n Z Xi=1 for some point set := x ,...,x [0, 1]d. The Koksma-Hlawka inequality states that Pn { 1 n}⊂ 1 n D f(x)dx f(xi) VarHK(f)k k∞ , (1.1) d − n ≤ n Z[0,1] i=1 X
where VarHK (f) is the Hardy-Krause variation of f and D is the discrepancy function, which characterizes how well-distributed the point set is. Hence, we are able to split the error of Pn integration into a product of two factors, one only depending on the function and one only depending on the point set. However, there are many simple functions (like indicator functions of rotated boxes, see Example 3.1.9) that are not of bounded Hardy-Krause-variation. For those functions, the Koksma-Hlawka inequality is useless. Therefore, there have been many efforts to prove a similar inequality using a less restrictive notion of variation. To this end, we also discuss more recent concepts like the Harman variation and the -variation. Moreover, in Theorem 4.3.1 we D prove a previously unknown inequality similar to (1.1) using the Hahn-variation, which is a lot less restrictive than the Hardy-Krause-variation. 2 Functions of one variable 4
2 Functions of one variable
We start by studying one-dimensional functions of bounded variation. The definition of those functions goes back to Jordan, see for example [30, 31], who studied functions of bounded variation in the late nineteenth century mainly in the context of Fourier series. Most results of this section are common knowledge. The main resources for writing this chapter were the books by Carothers ([12]), Folland ([18]), Royden ([44]), Rudin ([45]) and Yeh ([50]), as well as the book by Appell, Banaś and Merentes ([5]), which is a comprehensive introduction into functions of bounded variation. However, whenever possible, the mathematician who first proved a theorem was cited. First, we define the variation of a function and give many examples of functions of bounded variation. Next, we study the variation function which captures the variation on subintervals and shares many properties with its parent function. We also solve an open problem on the Hölder continuity of the variation function. Then, we study the monotone decomposition of functions of bounded variation, which gives us a useful tool for extending regularity properties like almost everywhere continuity and differentiability as well as Borel-measurability of monotone functions to functions of bounded variation. Furthermore, we investigate the connection between functions of bounded variation and signed Borel measures, as well as the Hausdorff and box dimension of the graph of functions of bounded variation, where we improve on a previously known result. Next, we study the functional analytic and algebraic structure of the space of functions of bounded variation, and, finally, we give an application to Fourier series.
2.1 Motivation and definition
Let γ : [a, b] R2 be a parametrization of a “nice” curve C = γ([a, b]). How can we define the → length of C? One possibility is to approximate C by a polygon with nodes a = t n γ(tj) γ(tj 1) 2. (2.2) k − − k jX=1 Here, . denotes the Euclidean norm on R2. If we include more partition points and the curve k k2 C is smooth enough, the resulting polygons should approximate C better. It is thus reasonable to define the length of C as the limit (or better the supremum) as the partition gets finer and finer, i.e. n ℓ(C) := sup γ(tj) γ(tj 1) 2, k − − k jX=1 where the supremum is taken over all partitions. Similarly, one can interpret the graph of a function f : [a, b] R as a curve in two dimensions. The → variation of f then captures the vertical changes in the graph of f. To properly define this variation or vertical change of f, we introduce some notation. First, instead of partitions of an interval, we consider ladders. There are two basic differences. First, a ladder usually does not contain the end point b, whereas a partition does. Second, we treat a ladder as an unordered set. Thereby we get rid of an index, making the notation more readable in higher dimensions. Definition 2.1.1. A ladder on the interval [a, b] is a finite subset of a (a, b) with a . In Y { }∪ ∈Y particular, b is in if and only if a = b. Y Let y . Then we define the successor y of y as the smallest element in larger than y. If there ∈Y + Y is no such element, we define y+ as b. Similarly, we define the predecessor y of y as the largest − 2 Functions of one variable 5 element in smaller than y. If there is no such element, we define y as a. Finally, we define the Y − predecessor b of b as the largest element of . − Y We denote by Y = Y[a, b]= Y(a, b)= Y( ) the set of ladders on . I I The somewhat awkward inclusion of the case a = b will be useful in higher dimensions. For the results of one-dimensional functions, however, we assume from now on that Therefore, we define the variation of a function as the supremum over all variations on ladders, as finer ladders capture more of the oscillation of a function. Definition 2.1.3. For a function f : R with = [a, b], we define its total variation as I → I Var(f; ) := Var(f; a, b) := sup ∆Y f. I Y Y∈ If the interval is clear from the context, we also write Var(f) instead of Var(f; ). We say that I I f is of bounded (total) variation, if Var(f) < . Finally, we denote by := [a, b] the set of ∞ BV BV functions on [a, b] with bounded total variation. Example 2.1.4. Indicator functions of intervals are of bounded total variation. For example, if [c, d] (a, b), then the indicator function 1 has total variation 2. ⊂ [c,d] Example 2.1.5. Monotone functions are of bounded variation. If f is monotonically increasing, then Var(f; a, b) = sup f(y+) f(y) = sup f(y+) f(y) = f(b) f(a) < . Y y | − | Y y − − ∞ Y∈ X∈Y Y∈ X∈Y The same holds for monotonically decreasing f, except for a sign change. 2 Functions of one variable 6 Example 2.1.6. Lipschitz continuous functions are also of bounded variation. Let f be a Lipschitz continuous function with Lipschitz constant L. Then Var(f; a, b) = sup f(y+) f(y) sup L y+ y = L(b a) < . Y y | − |≤ Y y | − | − ∞ Y∈ X∈Y Y∈ X∈Y Example 2.1.7. Absolutely continuous functions are of bounded variation. Recall that a function f : [a, b] R is called absolutely continuous, if for all ε > 0 there exists a δ > 0, such that for all → finite families of disjoint open subintervals (a , b ),..., (a , b ) of [a, b] with n (b a ) δ, we { 1 1 n n } i=1 i − i ≤ have n P f(b ) f(a ) ε. | i − i |≤ Xi=1 To prove that such functions are of bounded variation, choose ε = 1 and take δ > 0 as in the definition of absolute continuity. Define the ladder k ∗ := y [a, b) : y = a + δ for some k N0 . Y ∈ 2 ∈ Clearly, ∗ contains Y 2(b a) n := − δ points. For a ladder Y we define the ladder := . Then, we define the ladders Y ∈ Y′ Y∪Y∗ k k + 1 := ′ a + δ, a + δ Yk Y ∩ 2 2 h k k+1 on a+ 2 δ, a+ 2 δ for k = 0,...,n. We apply the absolute continuity of f to the intervals induced byh the ladder oni a + k δ, a + k+1 δ and get Yk 2 2 h i n n 2(b a) f(y+) f(y) f(y+) f(y) = f(y+) f(y) 1= − < . − ≤ ′ − − ≤ δ ∞ y y k=0 y k k=0 X∈Y X∈Y X X∈Y X Thus, f is of bounded variation. Example 2.1.8. The length of a curve C with parametrization γ : [a, b] R2 and γ(t)= x(t),y(t) → can be analogously defined using ladders by ℓ(C) := sup γ(y+) γ(y) 2. Y y − Y∈ X∈Y We call C rectifiable if ℓ(C) < . Then C is rectifiable if and only if both x and y are of bounded ∞ total variation. This illustrates that a curve is rectifiable, if and only if the horizontal and vertical variations of that curve are finite. The equivalence follows immediately from the observation that max x(t) x(s) , y(t) y(s) γ(t) γ(s) x(t) x(s) + y(t) y(s) . − − ≤ − 2 ≤ − − n o Example 2.1.9. If f : [a, b ] R is differentiable, then one can prove that → b Var(f; a, b) f ′(x) dx. (2.3) ≥ | | Za 1 Thus, the function f(x) = x sin(x− ) is of unbounded variation on [0, 1]. In particular, we show in Theorem 2.4.6 that functions of bounded variation are differentiable almost everywhere and also satisfy (2.3). Moreover, if f is absolutely continuous, we have equality in (2.3), as was shown in [5, Theorem 3.19]. Example 2.1.10. Other examples of functions of unbounded variation are the indicator function 1Q on [0, 1], or paths of Brownian motion, which are of unbounded variation with probability 1. 2 Functions of one variable 7 2.2 Variation functions The variation function of a function f : [a, b] R captures the variation of f on all the intervals → [a, x] for x [a, b]. We show that variation functions are increasing and that they share many ∈ regularity properties with their parent functions. However, we also show that the variation function of a Hölder continuous function need not be Hölder continuous, solving an open problem. Definition 2.2.1. The variation function Var : [a, b] [0, ] of a function f : [a, b] R is f → ∞ → defined as Varf (x) := Var(f; a, x). Conversely, f is called the parent function of Varf . First, we note that variation functions are always increasing. Proposition 2.2.2. Let f : R be a function and let c . Then I → ∈I Var(f; a, b) = Var(f; a, c) + Var(f; c, b). In particular, for a x y b, ≤ ≤ ≤ Var (y) Var (x) = Var(f; x,y) 0, f − f ≥ and the variation function Varf is increasing. Proof. If c = b, this is trivial. Assume that c < b and let be a ladder on . By Proposition 2.1.2 Y I we may assume that c . Define the ladders := y : y < c and := y : y c on ∈Y Y1 { ∈Y } Y2 { ∈Y ≥ } [a, c] and [c, b], respectively. Then 1 2 ∆Y (f; a, b)=∆Y (f; a, c)+∆Y (f; c, b). Taking the supremum over all ladders Y[a, b] yields Y ∈ Var(f; a, b) Var(f; a, c) + Var(f; c, b), ≤ since we can assume without loss of generality that every ladder in Y[a, b] contains c. Conversely, let and be ladders on [a, c] and [c, b], respectively. Then := is a ladder Y1 Y2 Y Y1 ∪Y2 on [a, b] and 1 2 ∆Y (f; a, b)=∆Y (f; a, c)+∆Y (f; c, b). Taking the supremum over all ladders Y[a, c] and Y[c, b] yields Y1 ∈ Y2 ∈ Var(f; a, b) Var(f; a, c) + Var(f; c, b). ≥ This implies the desired equality. We prove a slight generalization of the above Proposition. Lemma 2.2.3. Let f : [a, b] R be a function with f(a+) = f(a). Let b = z > z > z >... be → 0 1 2 a strictly decreasing sequence in [a, b] that converges to a. Then ∞ Var(f; a, b)= Var(f; zn+1, zn). nX=0 2 Functions of one variable 8 Proof. First, the series converges (potentially to infinity), since all the terms are non-negative. Applying Proposition 2.2.2, we have for k N that ∈ k 1 − Var(f; a, b) Var(f; z , z )= Var(f; z , z ). ≥ k 0 n+1 n nX=0 Taking k to infinity yields ∞ Var(f; a, b) Var(f; z , z ). ≥ n+1 n nX=0 On the other hand, let ε > 0 and let be a ladder on [a, b]. Let k N be such that a < z < a Y ∈ k + and f(a) f(z ) < ε. Such a k exists, since z a and therefore, f(z ) f(a). Proposition − k k → k → 2.2.2 yields f(y ) f(y) = f(y ) f(y) + f(a ) f(a) + − + − + − y y a X∈Y ∈Y\{X } Var(f; a , b)+ f(a ) f(z ) + f(z ) f(a) ≤ + + − k k − Var(f; a+, b) + Var(f; zk, a+)+ ε = Var(f; zk, b )+ ε ≤ k 1 − ∞ = Var(f; z , z )+ ε Var(f; z , z )+ ε. n+1 n ≤ n+1 n nX=0 nX=0 Since ε> 0 was arbitrary, ∞ f(y+) f(y) Var(f; zn+1, zn). y − ≤ n=0 X∈Y X Taking the supremum over all ladders Y[a, b] yields Y ∈ ∞ Var(f; a, b) Var(f; z , z ), ≤ n+1 n nX=0 which proves the lemma. The variation function and its parent function share many regularity properties. To state these connections, we need some definitions. Definition 2.2.4. Let (X, d) be a metric space. Let (X) denote the set of continuous functions C on X. A function f : X R is called Lipschitz continuous, if there exists a constant L> 0, such that for → all x ,x X, 1 2 ∈ d f(x ),f(x ) L x x . (2.4) 1 2 ≤ | 1 − 2| Furthermore, we denote by d f(x1),f(x2) lip(f) := lip(f; X):= sup x1=x2 x1 x2 6 | − | the minimal Lipschitz constant L in (2.4). The set of Lipschitz continuous functions on X is denoted by Lip = Lip(X). 2 Functions of one variable 9 A function f : X R is called α-Hölder continuous (with 0 <α< 1), if there exists a constant → L> 0, such that for all x ,x X, 1 2 ∈ d f(x ),f(x ) L x x α. (2.5) 1 2 ≤ | 1 − 2| Furthermore, we denote by d f(x1),f(x2) lipα(f) := lipα(f; X):= sup α x1=x2 x1 x2 6 | − | the minimal Hölder constant L in (2.5). The set of α-Hölder continuous functions on X is denoted by Lipα = Lipα(X). Let = [a, b] be an interval. Then a function f : R is called absolutely continuous if for I I → all ε > 0 there exists a δ > 0 such that for every finite sequence of pairwise disjoint intervals (x ,y ) that satisfies k k ⊂I yk xk < δ, k − X we have f(y ) f(x ) < ε. k − k k X The set of all absolutely continuous functions on [a, b] is denoted by AC = AC[a, b] = AC( ). I Finally, for an interval , 1( ) denotes the set of continuously differentiable real-valued functions. I C I We remind the reader that the inclusions 1 Lip Lip Lip C ⊆ ⊆ α ⊆ β ⊆C hold for α β. ≥ Before stating the connections between the variation function and its parent function, we define the left- and right-side limit of a function f at x0 as f(x0 ) := lim f(x0 ε) and f(x0+) := lim f(x0 + ε), − ε 0 − ε 0 ↓ ↓ if they exist. The following theorem is a selection of statements due to Huggins [29] and Russell [46]. Theorem 2.2.5. Let f : [a, b] R be a function and let Var be its variation function. Then the → f following statements hold. 1. The function f is of bounded variation if and only if the function Varf is of bounded variation. Moreover, in this case we have Var(Varf ) = Var(f). 2. If f is of bounded variation, then f is (left-/right-)continuous if and only if Varf is (left- /right-)continuous. 3. If f is of bounded variation, then f is Lipschitz continuous if and only if Varf is Lipschitz continuous. Moreover, in this case we have lip(Varf ) = lip(f). 4. If f is of bounded variation, then f is α-Hölder continuous if Varf is α-Hölder continuous. Moreover, in this case we have lip (Var ) lip (f). α f ≥ α 2 Functions of one variable 10 5. If f is of bounded variation, then f is absolutely continuous if and only if Varf is absolutely continuous. Proof. 1. First, let f be of bounded variation. Since Varf is increasing and Varf (a) = 0, it is easy to see that Var(Var ) = Var (b) Var (a) = Var (b) = Var(f), f f − f f implying that Varf is of bounded variation. Conversely, let Varf be of bounded variation. Then Var(f) = Var (b) < , as otherwise the variation of Var would be undefined. f ∞ f 2. We prove that f is right-continuous if and only if Varf is right-continuous. Similarly, f is left- continuous if and only if Varf is left-continuous. Together, this shows that f is continuous if and only if Varf is continuous. Let f be right-continuous at x. Let ε> 0 be arbitrary and let δ > 0 be such that f(x) f(x+h) < | − | ε/2 for all 0 h < δ. Let Y[x, b] be such that ≤ Y0 ∈ f(y ) f(y) Var(f; x, b) ε/2. | + − |≥ − y 0 X∈Y Using Proposition 2.1.2, we can assume that there is a y with x 0 Var (y ) Var (x) = Var(f; a, y ) Var(f; a, x) = Var(f; x,y ) ≤ f 0 − f 0 − 0 = Var(f; x, b) Var(f; y , b) ε − 0 ≤ holds for all y (x,x + h). Thus, Var is right-continuous at x. 0 ∈ f On the other hand, if Varf is right-continuous at x, then it follows from Proposition 2.2.2 that f(x + h) f(x) Var(f; x,x + h) = Var(f; a, x + h) Var(f; a, x) = Var (x + h) Var (x), | − |≤ − f − f which implies that f is right-continuous at x. 3. If f is Lipschitz continuous and a x y b, then ≤ ≤ ≤ Var (x) Var (y) = Var(f; x,y) lip(f) x y | f − f | ≤ | − | by Example 2.1.6. Conversely, if Var is Lipschitz continuous and a x y b, then f ≤ ≤ ≤ f(y) f(x) Var(f; x,y) = Var (y) Var (x) lip(Var ) y x . | − |≤ f − f ≤ f | − | 4. If Var is α-Hölder continuous and a x y b, then f ≤ ≤ ≤ f(y) f(x) Var(f; x,y) = Var (y) Var (x) lip (Var ) y x α. | − |≤ f − f ≤ α f | − | 5. Let f be absolutely continuous. Let ε > 0 and let δ > 0 be such that for all finite disjoint sequences of intervals (x ,y ) with k k ⊂I y x < δ (2.6) k − k k X 2 Functions of one variable 11 we have f(y ) f(x ) < ε. k − k k X Let (x1,y1),..., (xn,yn) be a disjoint sequence of intervals satisfying (2.6). On the interval [xk,yk] we can find a ladder = y ,...,y Yk k,1 k,mk with y f(y+) f(y) + ε/n Var(f; xk,yk). y − ≥ X∈Yk Then n n n ε Var (y ) Var (x ) = Var(f; x ,y ) f(y ) f(y) + 2ε f k − f k | k k ≤ + − n ≤ k=1 k=1 k=1 y k X X X X∈Y by the absolute continuity of f. This shows that Varf is absolutely continuous. Conversely, assume that Varf is absolutely continuous. Let ε> 0 and let δ > 0 be such that for all finite disjoint sequences of intervals (x ,y ) with (2.6) we have k k ⊂I Varf (yk) Varf (xk) < ε. k − X Then f(yk) f(xk) Varf (yk) Varf (xk) < ε, k − |≤ k − X X implying that f is absolutely continuous. Note the asymmetry in the fourth statement of the preceding theorem. In fact, it seems to be an open question whether the reverse direction holds (see [5, p. 80]). Here, we show with the following example that the reverse does not hold. Example 2.2.6. Let 0 <α< 1. We construct a function f that is of bounded variation and α-Hölder continuous, such that Var is γ-Hölder continuous for no γ (0, 1). f ∈ First, consider the following general example. Let x > x > > 0 be a sequence with x 0 1 2 · · · n → and let (yn) be a sequence with y2 > y4 > > 0, y2n 1 = 0 for n N and yn 0. Define the · · · − ∈ → function f : [0,x ] R as f(x ) = y and interpolate linearly in between. An example of such a 1 → n n function is shown in the picture below. 2 Functions of one variable 12 y2 y4 y6 y8 y10 0 x8 x7 x6 x5 x4 x3 x2 x1 α The blue graph is the function f on the interval [x11,x1], the red graph is the function x x . α 7→ The values y2n were chosen smaller than x2n in order to ensure that f is α-Hölder continuous at 0. It remains to choose the sequences (xn) and (yn) appropriately. First, the variation function Varf is easy to determine. Using Lemma 2.2.3, we have ∞ Varf (x2n 1) = Var f; 0,x2n 1 = 2 y2k. − − k=n X We want that Var is γ-Hölder continuous for no γ (0, 1). In order to achieve this, we can choose f ∈ the sequence (yn) to be decreasing as slowly as possible. Since f should be of bounded variation, 1 however, it needs to fall faster than n− , as otherwise, the series diverges. Therefore, we set 1 y2n = . 2n log(n + 1) 2 With this choice f is of bounded variation since ∞ 1 Var(f) = Varf (x1)= < . n log(n + 1) 2 ∞ nX=1 Now we have to choose the sequence (xn). Its decay should be slow enough so that f is α-Hölder continuous, but fast enough so that Var is γ-Hölder continuous for no γ (0, 1). We set f ∈ β x2n 1 = n− − for an appropriate choice of β > 0 that remains to be determined, and x2n 1 + x2n+1 x = − . 2n 2 2 Functions of one variable 13 First, note that β ∞ 1 ∞ 1 1 Varf (n− ) = Varf (x2n 1)= 2 2 dx = . − n log(n + 1) ≥ n+1 x(log x) log(n + 1) kX=n Z Therefore, for γ (0, 1), we have ∈ β βγ Varf (x) Varf (n− ) n sup γ sup βγ sup = , x (0,x1] x ≥ n N n− ≥ n N log(n + 1) ∞ ∈ ∈ ∈ since βγ > 0. Hence, Varf is not γ-Hölder continuous regardless of our choice of β > 0. It remains to ensure that f is α-Hölder continuous. First, f needs to be α-Hölder continuous at 0, i.e. 1 αβ αβ f(x) f(x2n) 2n(log(n+1))2 (n + 2) (3n) sup α sup α sup αβ = sup 2 sup 2 x (0,x1] x ≤ n N x2n+3 ≤ n N (n + 2)− n N 2n(log(n + 1)) ≤ n N 2n(log 2) ∈ ∈ ∈ ∈ ∈ αβ 3 αβ 1 2 sup n − < . ≤ 2(log 2) n N ∞ ∈ Therefore, we choose β such that 0 < β α 1. ≤ − Second, due to the specific structure of f, it is apparent that α 1 f(x) f(y) f(x ) 2 2 sup − = sup 2n = sup 2n(log(n+1)) α x2n−1 x2n+1 α β β α x,y (0,x1] x y n N − n N n− (n + 1)− ∈ | − | ∈ 2 ∈ − α−1 2 α 1 α(β+1) n(log 2)2 2 − (n + 1) sup β 1 α = 2 sup ≤ n N ((n + 1)− − ) (log 2) n N n ∈ ∈ 2α (n + 1)α(β+1) 2 sup ≤ (log 2) n N n + 1 α ∈ 2 α(β+1) 1 2 sup n − . ≤ (log 2) n N ∈ The last supremum is finite if α(β + 1) 1 0, i.e. if β α 1 1. Hence, f is α-Hölder continuous − ≤ ≤ − − if 1 1 1 0 < β min α− , α− 1 = α− 1. ≤ − − Since α< 1, the choice of such a β > 0 isn possible. Therefore,o the function f constructed this way is α-Hölder continuous, but Var is γ-Hölder continuous for no γ (0, 1). f ∈ We can greatly generalize the above result. To this end, we introduce moduli of continuity. Definition 2.2.7. A continuous, increasing function ω : [0, ) [0, ) with ω(0) = 0 is called ∞ → ∞ modulus of continuity. We remark that this is not the most general definition used for moduli of continuity. Often, the requirement that ω is increasing is dropped and the continuity is replaced with continuity at zero. The reason for our more restrictive definition is to achieve simpler and clearer statements and better consistency with the coming definitions. Proposition 2.2.10 illustrates, however, that our definition is in some sense the most general one. Moduli of continuity are usually not used by themselves. Instead, they are helpful in characterizing how continuous a given function is. 2 Functions of one variable 14 Definition 2.2.8. Let R be a bounded or unbounded interval and let f : R be a function. I ⊂ I → A modulus of continuity ω is called a modulus of continuity for f if for all x,y , we have ∈I f(x) f(y) ω x y . − ≤ | − | Examples of moduli of continuity are x Lx and x Lxα for 0 < α 1. They character- 7→ 7→ ≤ ize the Lipschitz and the α-Hölder continuous functions with Lipschitz and α-Hölder constant L, respectively. It is easy to see that given a function f and two moduli of continuity ω ω , if ω is a modulus 1 ≤ 2 1 of continuity for f, so is ω2. In that sense, larger moduli of continuity represent weaker continuity conditions. In particular, to every continuous function we can associate its minimal modulus of continuity. Definition 2.2.9. Let R be a bounded or unbounded interval and let f : R be a continuous I ⊂ I → function. The minimal modulus of continuity of f is defined as ω (h) := sup f(x) f(y) : x,y , x y h . f − ∈I | − |≤ n o We state the following facts about minimal moduli of continuity. Proposition 2.2.10. Let f : [a, b] R be a continuous function. Then ω is a modulus of → f continuity for f, is subadditive and satisfies ωωf = ωf . Moreover, if ω is a modulus of continuity for f, then ω ω. f ≤ Proof. It is obvious from the definition that ωf (0) = 0 and that ωf is increasing. Furthermore, note that ω (h) is finite for all h [0, ). This is because f is continuous on the compact set [a, b], and f ∈ ∞ hence bounded. We show that ω is subadditive. Let s,t 0. Then f ≥ ω (s + t) = sup f(x) f(y) : x,y [a, b], x y s + t f − ∈ | − |≤ n o = sup f(x) f(z)+ f(z) f(y) : x,y,z [a, b], x z s, z y t − − ∈ | − |≤ | − |≤ n o sup f(x) f(z) + f(z) f(y ) : x,y,z [a, b], x z s, z y t ≤ − − ∈ | − |≤ | − |≤ n o sup f(x) f(z) : x, z [a, b], x z s ≤ − ∈ | − |≤ n o + sup f(z) f(y ) : y, z [a, b], y z t − ∈ | − |≤ n o = ωf (s)+ ωf (t). Next, we show that ωf is continuous at zero. Since f is continuous on the compact set [a, b], it is uniformly continuous. Hence, for all ε > 0 there exists a δ > 0 such that f(x) f(y) ε for all − ≤ x y δ with x,y [a, b]. In particular, ωf (δ) ε. Since ε was arbitrary and ωf is increasing, | − | ≤ ∈ ≤ we have ωf (0+) = ωf (0) = 0. Now we prove that ωf is continuous everywhere. Let t,h > 0. Since ωf is subadditive and increasing, ω (t) ω (t + h) ω (t)+ ω (h). f ≤ f ≤ f f Taking h to zero and using that ωf (0+) = 0 yields that ωf is right-continuous. The left-continuity of ωf follows similarly from ω (t) ω (t h)+ ω (h) ω (t)+ ω (h). f ≤ f − f ≤ f f 2 Functions of one variable 15 Altogether, ωf is continuous. We have shown that ωf is a modulus of continuity. Now it is trivial that ωf is also a modulus of continuity for f. To show that ω = ω , let h 0. Since ω is increasing, ωf f ≥ f ω (h) = sup ω (x) ω (y) : x,y 0, x y h ωf f − f ≥ | − |≤ n o = sup ω (x + h) ω (x): x 0 ω (0 + h) ω (0) = ω (h). f − f ≥ ≥ f − f f n o On the other hand, since ωf is subadditive, ω (h) = sup ω (x + h) ω (x): x 0 sup ω (x)+ ω (h) ω (x): x 0 = ω (h). ωf f − f ≥ ≤ f f − f ≥ f n o n o Finally, let ω be another modulus of continuity for f. If there exists an h 0 with ω(h) <ω (h), ≥ f then there are two points x,y [a, b] with x y h and f(x) f(y) > ω(h). Since ω is a ∈ | − | ≤ − modulus of continuity for f, and since ω is increasing, ω(h) < f(x) f(y) ω x y ω(h), − ≤ | − | ≤ a contradiction. The fourth statement of Theorem 2.2.5 can be easily generalized to moduli of continuity. Proposition 2.2.11. Let f : [a, b] R be a continuous function of bounded variation. Then → ω ω . f ≤ Varf Proof. Since f is continuous and of bounded variation, also Varf is continuous by Theorem 2.2.5. Therefore, ω is well-defined. Now, for a x y b with y x h we have with Proposition Varf ≤ ≤ ≤ − ≤ 2.2.2 that f(y) f(x) Var(f; x,y) = Var (y) Var (x) ω (y x) ω (h). − ≤ f − f ≤ Varf − ≤ Varf Taking the supremum over all x and y as above yields ω (h) ωVar (h). f ≤ f Our goal is to show that the converse of Proposition 2.2.11 does not hold. In fact, given two (almost arbitrary) moduli of continuity ω,ω′, we show that there exists a function f of bounded variation with ω ω but ω ω . f ≤ Varf ≥ ′ We require a modulus of continuity to be increasing and continuous. However, we need additional regularity properties. The following lemmas show that we can assume those regularity properties without loss of generality. Lemma 2.2.12. Let ω be a bounded modulus of continuity. Then there exists a modulus of conti- nuity ω ω with ω (h)= ω (1) for all h 1. ′ ≥ ′ ′ ≥ Proof. Clearly, the function ω(h)+ ω ω(1) h h [0, 1] ω′(h)= k k∞ − ∈ ( ω h (1, ). k k∞ ∈ ∞ is a modulus of continuity, ω ω, and ω (h)= ω (1) for h 1. ′ ≥ ′ ′ ≥ Lemma 2.2.13. Let ω be a modulus of continuity with ω(h)= ω(1) for h 1. Then ω ω, and ≥ ω ≥ ω (h)= ω (1) for h 1. ω ω ≥ 2 Functions of one variable 16 Proof. First, for h 0 we have ≥ ω (h) = sup ω(x) ω(y) : x,y 0, x y h ω(h) ω(0) = ω(h). ω − ≥ | − |≤ ≥ − n o Second, notice that 0 = ω(0) ω(h) ω(1) for all h 0, since ω is increasing. Hence, ≤ ≤ ≥ ω (h) = sup ω(x) ω(y) : x,y 0, x y h ω(1) ω(0) = ω(1). ω − ≥ | − |≤ ≤ − n o On the other hand, for h 1, ≥ ω (h) = sup ω(x) ω(y) : x,y 0, x y h ω(1) ω(0) = ω(1). ω − ≥ | − |≤ ≥ − n o Hence, ω (h)= ω (1) = ω (1) for h 1 . ω ω ≥ Lemma 2.2.14. Let ω be a modulus of continuity that satisfies ω = ω and ω(h)= ω(1) for h 1. ω ≥ Then there exists a concave modulus of continuity ω with ω ω, ω ′ = ω and ω (h) = ω (1) for ′ ′ ≥ ω ′ ′ ′ h 1. ≥ Proof. Define ω′ as the concave majorant of ω, i.e. ω′(h) := inf αh + β : αt + β ω(t) for all t 0 . ≥ ≥ n o Clearly, ω ω. In particular, ω is non-negative and ω (h) ω(h) = ω(1) for h 1. Also, since ′ ≥ ′ ′ ≥ ≥ ω(1) ω(t) for all t 0, ω (h) ω(1) for all h 0. Therefore, ω (h)= ω (1) = ω(1) for all h 1. ≥ ≥ ′ ≤ ≥ ′ ′ ≥ We show that ω (0) = 0. If ω(h) = 0 for all h 0, this is trivial. Otherwise, for all ε 0,ω(1) ′ ≥ ∈ there exists a δ > 0 such that ω(h) < ε for h δ, since ω(0+) = ω(0) = 0. Define ≤ ω(1) ε α = − . δ Then αt + ε ω(t) for all t 0. Since ε> 0 was arbitrary, ω (0) = 0. ≥ ≥ ′ Next, we show that ω′ is increasing. Since ω is non-negative, we can restrict the infimum in the definition of ω′ to non-negative values of α (negative values of α lead to negative values of αt + β for t sufficiently large). Let t,h,ε > 0 and let α 0, β R be such that ≥ ∈ ω(s) αs + β for all s 0 ≤ ≥ and ω′(t + h) α(t + h)+ β ε. ≥ − Then, ω′(t) αt + β α(t + h)+ β ω′(t + h)+ ε. ≤ ≤ ≤ Since ε> 0 was arbitrary, we have ω (t) ω (t + h), and ω is increasing. ′ ≤ ′ ′ Now we show that ω is continuous. Let t 0. Since ω is concave, ′ ≥ ′ ω′ λt + (1 λ)x λω′(t) + (1 λ)ω′(x) − ≥ − for λ [0, 1]. Taking x = 0 and letting λ tend to one, we have ∈ ω′(t ) ω′(t), − ≥ 2 Functions of one variable 17 at least if t = 0. Since ω is increasing, ω (t )= ω (t). On the other hand, 6 ′ ′ − ′ λ2 λ2 ω′(t)= ω′ λ(t λ) + (1 λ) t + λω′(t λ) + (1 λ)ω′ t + . − − 1 λ ≥ − − 1 λ − − Taking λ to zero yields ω′(t) ω′(t+). ≥ Again since ω′ is increasing, ω′(t+) = ω′(t)= ω′(t ). − In particular, ω′ is continuous. It remains to show that ωω′ = ω′. We show that ω′ is subadditive, the proof is then analogous to the proof of Proposition 2.2.10. Since ω′ is concave, we have ω′(λx)= ω′ λx + (1 λ)0 λω′(x) + (1 λ)ω′(0) = λω′(x) − ≥ − for x 0, λ [0, 1]. Let s,t 0. Then ≥ ∈ ≥ s t s t ω′(s + t)= ω′(s + t)+ ω′(s + t) ω′ (s + t) + ω′ (s + t) = ω′(s)+ ω′(t). s + t s + t ≤ s + t s + t We mainly exploit the following property of concave functions. Lemma 2.2.15. Let be a bounded or unbounded interval, and let g : R be a concave function. I I → Let x,y,x + h, y + h with x y and h 0. Then ∈I ≥ ≥ g(x + h) g(x) g(y + h) g(y). − ≤ − Proof. By the definition of concavity, the graph of g on the interval [y,y + h] lies “above” the secant g(y + h) g(y) s(t) := g(y) + (t y) − . − h Indeed, y + h t t y y + h t t y s(t)= − g(y)+ − g(y + h) g − y + − (y + h) = g(t) h h h h ≤ for t [y,y + h]. We show that on [y,y + h] the graph of g lies “below” the secant s. ∈ I\ Suppose that there exists a t [y,y + h] such that g(t) > s(t), and assume without loss of ∈ I\ generality that t>y + h. Let u (y,y + h) and let λ [0, 1] be such that ∈ ∈ y + h = λt + (1 λ)u. − Then s(y + h)= λs(t) + (1 λ)s(u) < λg(t) + (1 λ)g(u) g(y + h)= s(y + h), − − ≤ a contradiction. To prove the statement of the lemma, we distinguish two different cases. First, assume that y ≤ x y + h x + h. Let s be defined as above. Since s is affine, ≤ ≤ g(y + h) g(y)= s(y + h) s(y)= s(x + h) s(x) g(x + h) g(x). − − − ≥ − 2 Functions of one variable 18 On the other hand, assume that y y + h x x + h. Inductively applying the first case, we have ≤ ≤ ≤ g(y + h) g(y) g(y + 2h) g(y + h) g(y + 3h) g(y + 2h) . . . − ≥ − ≥ − ≥ For some k N, we have y + kh x y + (k + 1)h x + h. Again, we apply the first case and ∈ ≤ ≤ ≤ have g(y + h) g(y) g(y + (k + 1)h) g(y + kh) g(x + h) g(x). − ≥ − ≥ − Finally, we prove that concave functions are almost Lipschitz continuous. Lemma 2.2.16. Let g : [0, 1] R be a concave increasing function. Then g is Lipschitz continuous → on all intervals [ε, 1] with ε (0, 1). ∈ Proof. Let ε (0, 1) and let s be the secant through the points 0, g(0) and ε, g(ε) . We write ∈ s(t)= αt + g(0) for the correct value of α. In the proof of Lemma 2.2.15, we have shown that s(t) g(t) for t [0, ε] ≤ ∈ and s(t) g(t) for t [ε, 1]. ≥ ∈ Let ε x y 1 and let s be the secant through the points 0, g(0) and x, g(x) . Again write ≤ ≤ ≤ ′ s′(t)= α′t + g(0) for the correct value of α′. Since g is concave, s′(ε) g(ε)= s(ε). ≤ Therefore, 0 α α. Since g is increasing and concave, ≤ ′ ≤ g(y) g(x) = g(y) g(x) s′(y) s′(x)= α′(y x) α y x . − − ≤ − − ≤ | − | Hence, g is Lipschitz continuous with Lipschitz constant α on [ε, 1]. We now prove that we cannot make any reasonable conclusion on the modulus of continuity of the variation function if we only know the modulus of continuity of the parent function. Theorem 2.2.17. Let ω,ω′ be two moduli of continuity such that ω(h) lim = , h 0 h ∞ → and ω is bounded. Then there exists a function f : [0, 1] R of bounded variation such that ω ω ′ → f ≤ and ω ω . Varf ≥ ′ Remark 2.2.18. The condition on ω is necessary, as otherwise f is Lipschitz continuous, which again implies that Varf is Lipschitz continuous by Theorem 2.2.5. The condition on ω′ is necessary, since f needs to be of bounded variation, and thus Varf and ωVarf are bounded as well. Proof. Using Lemma 2.2.12, Lemma 2.2.13 and Lemma 2.2.14, we can assume without loss of generality that ω (h)= ω (1) for h 1, ω ′ = ω , and ω is concave. ′ ′ ≥ ω ′ ′ Define the function V : [0, 1] R, V (x) = ω (x). Then ω = ω . We inductively construct a → ′ V ′ non-negative function f on the intervals [x ,x ], [x ,x ],... with x = 1 and x 0, such that ω 1 0 2 1 0 n → is a modulus of continuity for f and Varf = V . 2 Functions of one variable 19 Assume we have already constructed f on the interval [xn, 1]. If xn = 0, we have already defined f on the entire interval [0, 1]. Otherwise, we define xn+1 and construct f on the interval [xn+1,xn]. First, to every point x [0,x ], we assign a point y [x,x ] with the property that ∈ n x ∈ n V (x)+ V (x ) V (y )= n . x 2 Such a point yx exists, since V is increasing and continuous. Define the set A := x [0,x ]: V (x + h) V (x) ω(h) for all h [0,y x] . n+1 ∈ n − ≤ ∈ x − n o Since both V and ω are continuous, the set An+1 is closed, and thus compact. It is non-empty since x A . Therefore, n ∈ n+1 x := inf A A . n+1 n+1 ∈ n+1 Furthermore, we define yn+1 := yxn+1 . Finally, we define the function f on [xn+1,xn] as V (z) V (x ) z [x ,y ] f(z)= − n+1 ∈ n+1 n+1 (V (x ) V (z) z [y ,x ]. n − ∈ n+1 n We note some simple facts about the function f. We always have f(xn)=0 and V (xn 1) V (xn) f(y )= − − . n 2 Since V is continuous, f is continuous where it is defined. Since V is increasing, f is piecewise monotone; f is increasing on the intervals [xn,yn] and decreasing on the intervals [yn+1,xn]. Since V is concave, f is concave on the intervals [xn,yn] and convex on the intervals [yn+1,xn]. 0.95 0.70 0.45 0.25 0 xn yxn = yn xn 1 − 2 Functions of one variable 20 The above picture shows such a function f on the interval [xn,xn 1]. The red function is the − variation function V , the blue function is the parent function f. On the interval [xn,yn], f(z) = V (z)+ c, and on the interval [yn,xn 1], f(z)= V (z)+ c′. This construction already suggests that − − Varf = V . The constants c and c′ are chosen such that f(xn) = f(xn 1) = 0, and the point yn is − chosen such that f is continuous. The point xn is chosen such that ω is a modulus of continuity for f (a priori at least on the interval [xn,yn]). The remaining proof is split into four steps. First, we show that (xn) converges to zero. Hence, we have defined the function f on the interval (0, 1]. Second, we prove that f(0+) = 0, and, therefore, extend f continuously to [0, 1] with f(0) = 0. Then, we show that ω ω and finally, we prove f ≤ that Varf = V . 1. Clearly, (xn) is decreasing and bounded from below by zero. Thus, (xn) converges, say to the point x [0, 1]. Assume that x = 0. Since V is concave, it is Lipschitz continuous with constant L ∈ 6 on [x/2, 1] by Lemma 2.2.16. Since ω(h) lim = , h 0 h ∞ → there exists an ε> 0 such that ω(h) Lh for all h [0, ε]. Let n N be sufficiently large such that ≥ ∈ ∈ 0 x x ε/2. Define z := max x/2,x ε x/2, 1 . Then ≤ − n ≤ n+1 n − ∈ V (z + h) V (z ) Lh ω(h) n+1 − n+1 ≤ ≤ for h [0, ε]. Hence, z A and x = min A z < x, a contradiction. Therefore, ∈ n+1 ∈ n+1 n+1 n+1 ≤ n+1 (xn) converges to zero. In particular, we have also shown that (xn) is strictly decreasing. 2. If the sequence (x ) is finite, this statement is trivial, since then x = 0 for some n N. If(x ) n n ∈ n is infinite, it suffices to show that f(yn) converges to zero. Suppose this is not the case. Then there exists an ε> 0 such that f(y ) ε for infinitely many n N. Let (y ) be a subsequence of (y ) n ≥ ∈ nk k n with f(y ) ε. Since V is increasing, nk ≥ k k V (1) V (0) V (y ) V (x ) V (y ) V (x ) = f(y ) kε − ≥ n1 − nk ≥ nj − nj nj ≥ jX=1 jX=1 for all k N, a contradiction. Hence, f(0+) = 0 and we extend f continuously to [0, 1] with ∈ f(0) = 0. 3. Let h 0. We show that ω (h) ω(h), i.e. ≥ f ≤ sup f(x) f(y) : x,y [0, 1], x y h ω(h). − ∈ | − |≤ ≤ n o Since ω is increasing, it suffices to show that sup f(x) f(y) : x,y [0, 1], x y = h ω(h). − ∈ | − | ≤ n o This in turn is equivalent to sup f(x + h) f(x) : x [0, 1 h] ω(h). − ∈ − ≤ n o Let x [0, 1 h]. It remains to show that ∈ − f(x + h) f(x) ω(h). − ≤ We also write y instead of x + h. We distinguish several different cases depending on the positions of x and y relative to the points x and y . To every point z (0, 1], we can assign n(z) N such n n ∈ ∈ that xn(z) < z xn(z) 1. The special case x = 0 is treated at the very end as Case 3. ≤ − 2 Functions of one variable 21 Case 1. We have n := n(x) = n(y). We distinguish whether x,y are in the intervals [xn,yn] or [yn,xn 1]. − Case 1.1. We have x,y [x ,y ]. Using Lemma 2.2.15, ∈ n n f(y) f(x) = V (y) V (x ) V (x) V (x ) = V (y) V (x) − − n − − n − = V (y) V (x) V (x + h) V (x ) ω(h). − ≤ n − n ≤ Case 1.2. We have x,y [yn,xn 1]. Here, we need an additional distinction on the distance ∈ − h = y x. − Case 1.2.1. Assume that h y x . Using Lemma 2.2.15 and Case 1.1, ≤ n − n f(y) f(x) = V (xn 1) V (y) V (xn 1) V (x) = V (x) V (y) − − − − − − − = V (x + h) V (x) V (x + h) V (x ) ω(h). − ≤ n − n ≤ Case 1.2.2. Assume that h y x . Using the defining property of y , ≥ n − n n f(y) f(x) = V (xn 1) V (y) V (xn 1) V (x) = V (x) V (y) − − − − − − − = V (y) V (x) V (x n 1) V (yn)= V (yn) V (xn) − ≤ − − − ω(y x ) ω(h). ≤ n − n ≤ Case 1.3. We have x [xn,yn] and y [yn,xn 1]. Using the preceding cases, ∈ ∈ − f(y) f(x) max f(y) f(y ) , f(x) f(y ) − ≤ − n − n n o max ω(y y ),ω(y x) ≤ − n n − n o = ω max y y ,y x ω(h). − n n − ≤ Case 2. We have m := n(y) < n(x) =: n. We again distinguish several different cases and reduce them all to Case 1. Case 2.1. We have x [x ,y ]. ∈ n n Case 2.1.1. We have y [x ,y ]. ∈ m m Case 2.1.1.1. We have f(x) f(y). Then, ≤ f(y) f(x) = f(y) f(x) f(y)= f(y) f(x ) − − ≤ − m = f(y) f(xm) ω(y xm) ω(y x)= ω(h). − ≤ − ≤ − Case 2.1.1.2. We have f(y) f(x ). Then, ≤ f(y) f(x) = f(x) f(y) f(x)= f(x) f(xn 1)= f(x) f(xn 1) − − ≤ − − − − ω(xn 1 x) ω(y x)= ω(h). ≤ − − ≤ − Case 2.1.2. We have y [ym,xm 1]. ∈ − Case 2.1.2.1. We have f(x) f(y). Then, ≤ f(y) f(x) = f(y) f(x) f(y)= f(y) f(x ) − − ≤ − m = f(y) f(xm) ω(y xm) ω(y x) ω(h). − ≤ − ≤ − ≤ 2 Functions of one variable 22 Case 2.1.2.2. We have f(y) f(x). Then, ≤ f(y) f(x) = f(x) f(y) f(x)= f(x) f(xn 1)= f(x) f(xn 1) − − ≤ − − − − ω(xn 1 x) ω(y x)= ω(h). ≤ − − ≤ − Case 2.2. We have x [yn,xn 1]. ∈ − Case 2.2.1. We have y [x ,y ]. ∈ m m Case 2.2.1.1. We have f(x) f(y). Then, ≤ f(y) f(x) = f(y) f(x) f(y)= f(y) f(x ) − − ≤ − m = f(y) f(xm) ω(y xm) ω(y x)= ω(h). − ≤ − ≤ − Case 2.2.1.2. We have f(y) f(x ). Then, ≤ f(y) f(x) = f(x) f(y) f(x)= f(x) f(xn 1)= f(x) f(xn 1) − − ≤ − − − − ω(xn 1 x) ω(y x)= ω(h). ≤ − − ≤ − Case 2.2.2. We have y [ym,xm 1]. ∈ − Case 2.2.2.1. We have f(x) f(y). Then, ≤ f(y) f(x) = f(y) f(x) f(y)= f(y) f(x ) − − ≤ − m = f(y) f(xm) ω(y xm) ω(y x)= ω(h). − ≤ − ≤ − Case 2.2.2.2. We have f(y) f(x ). Then, ≤ f(y) f(x) = f(x) f(y) f(x)= f(x) f(xn 1)= f(x) f(xn 1) − − ≤ − − − − ω(xn 1 x) ω(y x)= ω(h). ≤ − − ≤ − Case 3. We have x = 0. Define n := n(h)= n(y). Then, f(y) f(x) = f(h)= f(h) f(x )= f(h) f(x ) ω(h x ) ω(h). − − n − n ≤ − n ≤ 4. Using Lemma 2.2.3 and that f is continuous at zero and piecewise monotone, we have for x [0, 1] ∈ with x x y that n ≤ ≤ n ∞ Varf (x) = Var(f; 0,x)= Var f; xk+1,yk+1 + Var f; yk+1,xk + Var f; xn,x k=n X ∞ = f(y ) f(x )+ f(y ) f(x ) + f(x) f(x ) k+1 − k+1 k+1 − k − n kX=n ∞ ∞ V (x ) V (x ) = 2 f(y )+ f(x) = 2 k − k+1 + V (x) V (x ) k+1 2 − n kX=n kX=n = lim V (xk+1)+ V (xn)+ V (x) V (xn)= V (x). − k − →∞ 2 Functions of one variable 23 Similarly, for y x x , we have n+1 ≤ ≤ n Varf (x) = Var(f; 0,x) ∞ = Var f; xk+1,yk+1 + Var f; yk+1,xk k=n+1 X + Var f; xn+1,yn+1 + Var f; yn+1,x ∞ = f(y ) f(x )+ f(y ) f(x ) k+1 − k+1 k+1 − k k=Xn+1 + f(y ) f(x )+ f(y ) f(x) n+1 − n+1 n+1 − ∞ ∞ V (x ) V (x ) = 2 f(y ) f(x) = 2 k − k+1 V (x ) V (x) k+1 − 2 − n − k=n k=n X X = lim V (xk+1)+ V (xn) V (xn)+ V (x)= V (x). − k − →∞ 2.3 Decomposition into monotone functions The main result of this section is that we can decompose functions of bounded variation into the difference of two monotone functions. We can even state such a decomposition explicitly. Throughout this section, we only consider functions defined on a fixed interval = [a, b]. I In Example 2.1.5 we have seen that monotone functions are of bounded total variation. It is easily seen that linear combinations of functions of bounded variation are again of bounded variation. Proposition 2.3.1. If f and g are of bounded variation and if α, β R, then ∈ Var(αf + βg) α Var(f)+ β Var(g). ≤ | | | | In particular, the set ( ) is a vector space. BV I Proof. Let f, g and let α, β R. Then ∈ BV ∈ Var(αf + βg) = sup (αf + βg)(y+) (αf + βg)(y) Y y | − | Y∈ X∈Y = sup αf(y+) αf(y)+ βg(y+) βg(y) Y y | − − | Y∈ X∈Y sup α f(y+) f(y) + β g(y+) g(y) ≤ Y y | || − | | || − | Y∈ X∈Y α sup f(y+) f(y) + β sup g(y+) g(y) ≤ | | Y y | − | | | Y y | − | Y∈ X∈Y Y∈ X∈Y = α Var(f)+ β Var(g) < . | | | | ∞ Thus, the difference of two monotone functions is again of bounded variation. The following theorem states that the converse is also true, i.e. all functions of bounded variation can be written as the difference of two increasing functions. This theorem is of fundamental importance, since it enables us to extend many results for monotone functions to functions of bounded variation. It is also called the Jordan Decomposition Theorem and is due to Jordan, who was the first to introduce functions of bounded variation (see for example [45]). 2 Functions of one variable 24 Theorem 2.3.2 (Jordan Decomposition Theorem). If f : [a, b] R is of bounded variation, then → there are increasing functions f +,f : [a, b] R with f +(a)= f (a) = 0 and − → − + f(x) f(a)= f (x) f −(x) (2.7) − + − Varf (x)= f (x)+ f −(x). Furthermore, this decomposition is unique and the functions satisfy + + + Var(f f −) = Var(f + f −) = Var(f ) + Var(f −) = Var(f) = Var(Var ). − f + If f is right-continuous, then also f and f − are right-continuous. Similar statements hold for left-continuous and continuous f. Proof. We can reformulate the equations (2.7) as 1 f +(x)= (Var (x)+ f(x) f(a)) 2 f − 1 f −(x)= (Var (x) f(x)+ f(a)). 2 f − The uniqueness is apparent from this representation and the claims about the continuity follow from Theorem 2.2.5. It remains to show that f + and f are increasing. We show that Var (x) f(x) is increasing. Take − f ± ε> 0 and x ,x [a, b] with x f(y+) f(y) Varf (x1) ε. y | − |≥ − X∈Y Then Varf (x2) f(x2) f(y+) f(y) + f(x2) f(x1) f(x2) ± ≥ y | − | | − |± X∈Y Var (x ) ε + f(x ) f(x ) (f(x ) f(x )) f(x ) ≥ f 1 − | 2 − 1 |± 2 − 1 ± 1 Var (x ) ε f(x ). ≥ f 1 − ± 1 Since ε> 0 is arbitrary, we get Var (x ) f(x ) Var (x ) f(x ). f 2 ± 2 ≥ f 1 ± 1 + Finally, Var(f) = Var(f f(a)) = Var(f f −). Furthermore, by Proposition 2.2.2, Var(f) = + − + − + Var(Varf ) = Var(f + f −). Since f and f − are increasing, also f + f − is increasing and thus + + + + + + Var(f +f −) = (f +f −)(b) (f +f −)(a) = (f (b) f (a))+(f −(b) f −(a)) = Var(f )+Var(f −). − − − + Remark 2.3.3. The functions f and f − in the above theorem are called the positive and negative variation functions of f, respectively. Notice that the Jordan Decomposition Theorem implies that is the linear hull of the monotone functions (which do not form a vector space on their own). BV 2.4 Continuity, differentiability and measurability We have seen in Example 2.1.9 that there are differentiable functions that are of unbounded varia- tion. On the other hand, Example 2.1.4 shows us that there are discontinuous functions that are of bounded variation. In light of these examples, we want to examine the connection between bounded 2 Functions of one variable 25 variation and continuity and differentiability more closely. All proofs in this chapter use the mono- tone decomposition of functions of bounded variation. So we always first prove the corresponding statements for increasing functions, and then transfer them to functions of bounded variation. First, recall that points of discontinuity of a function can be classified into different types. The two most interesting types for our studies are the removable discontinuities and the discontinuities of jump type. A function f has a removable discontinuity at x , if f(x ) and f(x +) exist and are 0 0− 0 finite, and if f(x )= f(x +) = f(x ). Those discontinuities are called removable, since they can 0− 0 6 0 be removed by redefining f at x as f(x ) (or f(x +)). A discontinuity of f at x is of jump 0 0− 0 0 type, if f(x +) and f(x ) exist and are finite, but different. Other types of discontinuities are, 0 0− for example, infinite discontinuities (when the function blows up) or mixed discontinuities (when at least one of the one-sided limits does not exist). Lemma 2.4.1. Let f : [a, b] R be an increasing function. Then the number of discontinuities of → f is countable and they are all of jump type. Furthermore, f is Borel-measurable. Proof. Let f be discontinuous at x0. Since f is increasing and bounded, the limits f(x0 ) = lim f(x0 ε) and f(x0+) = lim f(x0 + ε) − ε 0 − ε 0 ↓ ↓ exist and are finite. If the limits are equal, then f has a removable discontinuity at x0. It is easily seen that this is a contradiction to the monotonicity of f. Therefore, the discontinuity is of jump type. Again since f is increasing, we have f(x ) < f(x +). Since there is a different rational 0− 0 number in all the intervals f(x ),f(x+) when x is a discontinuity of f, the set of discontinuities − must be countable. The measurability follows immediately from the fact that the sets f 1(( , α)) are intervals. − −∞ Theorem 2.4.2. Functions of bounded variation have at most countably many discontinuities. Those discontinuities are removable or of jump type. Furthermore, functions of bounded variation are Borel-measurable. Proof of Theorem 2.4.2. By Theorem 2.3.2 we can write functions of bounded variation as the dif- ference of two monotonically increasing functions. By Lemma 2.4.1, those functions only have a countable number of discontinuities and are Borel-measurable. Thus, also functions of bounded variation can only have a countable number of discontinuities and are also Borel-measurable. Fur- thermore, since one-sided limits of increasing functions exist, they also exist for functions of bounded variation, proving that the points of discontinuity are either removable or of jump type. Next, we show that functions of bounded variation are differentiable almost everywhere. In our proof, we follow Royden [44] closely. Definition 2.4.3. Let A R be a set and let be a collection of non-degenerate intervals (i.e. we ⊂ J only consider intervals with infinitely many points) covering A. Then is called a Vitali-cover of J A if for all x A and ε> 0 there exists an interval I such that x I and λ(I) < ε. ∈ ∈J ∈ Lemma 2.4.4 (Vitali Covering Lemma). Let A R be a set of finite outer measure and let be a ⊂ J Vitali-cover of A. Then for all ε> 0 there exists a finite collection I1,...,In of pairwise disjoint intervals in such that J n λ∗ A I < ε. \ i i[=1 2 Functions of one variable 26 Proof. We can assume without loss of generality that all the intervals in are closed. Otherwise, J we replace them by their closure and note that the set of the endpoints of I1,...,In has measure zero. Let U be an open set of finite measure containing A. Since is a Vitali-cover, we assume without J loss of generality that U contains all the intervals in . We construct the sequence I ,...,I J 1 n inductively. Choose I in arbitrarily. Suppose we have already determined I ,...,I . 1 J 1 k First, assume that there is no interval I that is disjoint from the intervals I ,...,I . We show ∈J 1 k that then, k A = Ii. (2.8) i[=1 Indeed, let x A k I . Since k I is closed, there exists an δ > 0 such that ∈ \ i=1 i i=1 i S S k (x δ, x + δ) I = . − ∩ i ∅ i[=1 Since is a Vitali-cover, there exists an interval I with x I and λ(I) < δ. For this interval, J ∈J ∈ we have k k I I (x δ, x + δ) I = . ∩ i ⊂ − ∩ i ∅ i[=1 i[=1 This is a contradiction to our assumption that no such disjoint interval exists. Hence, (2.8) holds, which proves the lemma. On the other hand, assume that there exists an interval I that is disjoint from I ,...,I . Let ∈J 1 k a be the supremum over all the lengths of the intervals in that are disjoint from I ,...,I . Since k J 1 k each interval in is contained in U, it is clear that a λ(U) < . Now choose I as an interval J k ≤ ∞ k+1 in that is disjoint from I ,...,I and satisfies λ(I ) a /2. J 1 k k+1 ≥ k With the above procedure we get a sequence (I ) of pairwise disjoint intervals in . Since k J ∞ ∞ λ(I )= λ I λ(U) < , (2.9) k k ≤ ∞ kX=1 k[=1 the series converges and there exists an n N such that ∈ ∞ λ(Ik) < ε/5. k=Xn+1 It remains to show that λ∗(R) < ε with n R = A I . \ k k[=1 Let x R. Since n I is closed and is a Vitali-cover, there exists an interval I in small ∈ k=1 k J J enough such that x I and I is disjoint from I ,...,I . We show that I intersects some I for S∈ 1 n k k large enough. Indeed, if I is disjoint from all I , then a λ(I) for all k N. Hence, we have k k ≥ ∈ λ(I ) λ(I)/2 for all k N. This is a contradiction to (2.9). Therefore, I intersects some I . k ≥ ∈ k Let k be the smallest integer such that I intersects Ik. Then k>n and λ(I) ak 1 2λ(Ik). ≤ − ≤ Since x I and since I intersects I , the distance of x to the midpoint of I is at most ∈ k k 1 5 λ(I)+ λ(I ) λ(I ). 2 k ≤ 2 k 2 Functions of one variable 27 Therefore, if we define J to be I stretched by a factor of 5 with the same midpoint, then x J . l l ∈ k Hence, ∞ R J ⊂ k k=[n+1 and therefore ∞ ∞ λ∗(R) λ(J ) = 5 λ(I ) < ε, ≤ k k k=Xn+1 k=Xn+1 what had to be shown. We use this lemma to prove that increasing functions are differentiable almost everywhere. To this end, we define four different derivatives of a function f at x as follows. f(x + h) f(x) D+f(x) := lim sup − h 0 h ↓ f(x) f(x h) D−f(x) := lim sup − − h 0 h ↓ f(x + h) f(x) D+f(x) := lim inf − h 0 h ↓ f(x) f(x h) D f(x) := lim inf − − − h 0 h ↓ + Of course, f is differentiable at x if and only if D f(x)= D−f(x)= D+f(x)= D f(x) = . − 6 ±∞ Lemma 2.4.5 (Lebesgue). Let f : [a, b] R be an increasing function. Then f is differentiable → almost everywhere, the derivative f ′ is Lebesgue-measurable, and b f ′(x)dx f(b) f(a). ≤ − Za Proof. We first show that the sets where two of the introduced derivatives differ have outer measure zero. Let A be the set on which D+f(x) > D f(x), the other cases follow analogously. It is clear − that we can write A as the union of the sets + Ap,q := D f>p>q>D f − for p,q Q, so it suffices to show that λ (A ) = 0. ∈ ∗ p,q Let ε > 0, choose p,q Q with p>q and denote s := λ (A ). Take an open set U A with ∈ ∗ p,q ⊃ p,q λ(U) n n f(x ) f(x h )