Functions of Bounded Variation in One and Multiple Dimensions

Eingereicht von Simon Breneis

Angefertigt am Institut für Analysis

Beurteiler Univ.-Prof. Dr. Aicke Hin- richs

Mitbetreuung O.Univ.-Prof. Dr.phil. Functions of bounded Dr.h.c. Robert Tichy April 2020 variation in one and multiple dimensions

Masterarbeit zur Erlangung des akademischen Grades Diplom-Ingenieur im Masterstudium Mathematik in den Naturwissenschaften

JOHANNES KEPLER UNIVERSITÄT LINZ Altenbergerstraße 69 4040 Linz, Österreich www.jku.at DVR 0093696 Eidesstattliche Erklärung i

Eidesstattliche Erklärung

Ich erkläre an Eides statt, dass ich die vorliegende Masterarbeit selbstständig und ohne fremde Hilfe verfasst, andere als die angegebenen Quellen und Hilfsmittel nicht benutzt bzw. die wörtlich oder sinngemäß entnommenen Stellen als solche kenntlich gemacht habe. Die vorliegende Masterarbeit ist mit dem elektronisch übermittelten Textdokument identisch.

Ort, Datum Unterschrift Abstract ii

Abstract

In this Master’s thesis, we investigate the properties of functions of bounded variation. First, we consider univariate functions, afterwards we generalize this notion to higher dimensions. There are many diﬀerent deﬁnitions of multivariate functions of bounded variation. We study functions of bounded variation in the senses of Vitali; Hardy and Krause; Arzelà; and Hahn. Many results for those functions of bounded variation were previously only known in the bivariate case. We extend them to arbitrary dimensions, and also add some new results. Contents iii

Eidesstattliche Erklärung i

1 Introduction 1

2 Functions of one variable 4 2.1 Motivation and deﬁnition ...... 4 2.2 Variationfunctions ...... 7 2.3 Decomposition into monotone functions ...... 23 2.4 Continuity, diﬀerentiability and measurability ...... 24 2.5 SignedBorelmeasures ...... 32 2.6 Dimensionofthegraph ...... 35 2.7 Structure of ...... 40 BV 2.8 Ideal structure of ...... 44 BV 2.9 FourierSeries ...... 47

3 Functions of multiple variables 60 3.1 Deﬁnitions...... 60 3.2 Thevariationfunctions ...... 71 3.3 Closureproperties ...... 78 3.4 Decompositions into monotone functions ...... 80 3.5 Inclusions ...... 83 3.6 Continuity, diﬀerentiability and measurability ...... 85 3.7 SignedBorelmeasures ...... 96 3.8 Dimensionofthegraph ...... 97 3.9 Productfunctions...... 98 3.10 Structureofthefunctionspaces...... 108 3.11 Ideal structure of the functions spaces ...... 111

4 The Koksma-Hlawka inequality 116 4.1 Harmanvariation...... 117 4.2 -variation ...... 118 D 4.3 Koksma-Hlawka inequality for the Hahn-variation ...... 121 4.4 Otherestimates...... 123

Literature 125 1 Introduction 1

1 Introduction

The goal of this Master’s thesis is to study the properties of functions of bounded variation. We study univariate functions of bounded variation in Section 2 and multivariate functions of bounded variation in Section 3. Finally, in Section 4, we give an application of functions of bounded variation to numerical integration. We deﬁne the variation of a univariate function f : [a, b] R by → n Var(f; a, b) := sup f(xi) f(xi 1) : a = x0 x1 xn = b for some n N . − − ≤ ≤···≤ ∈ i=1 X

If the interval [a, b] is clear from the context, we also write Var(f) := Var(f; a, b). If the variation Var(f) is finite, we say that f is of bounded variation. Functions of bounded variation were first introduced by Jordan in [31] in the study of Fourier series. By now, they have many applications, for example in the study of Riemann-Stieltjes integrals. In Section 2.1 we motivate and define functions of bounded variation in one dimension, albeit using different (non-standard) notation. The reason is that the standard notation is quite messy and hard to read in the higher-dimensional setting. Hence, to prepare the reader for Section 3, we already use notation that can be extended more easily to multivariate functions. Furthermore, we give many examples and non-examples of functions of bounded variation. In Section 2.2 we study variation functions. To every function f : [a, b] R of bounded variation, → we can associate its variation function Var : [a, b] R defined by f → Varf (x) := Var(f; a, x). A function and its variation function share many regularity properties. For example, f is continuous if and only if Varf is continuous and f is Lipschitz continuous if and only if Varf is Lipschitz continuous, see also Theorem 2.2.5. It is also easy to see that f is α-Hölder continuous if Varf is α-Hölder continuous. The reverse direction seems to be an open problem. We answer this problem negatively with Example 2.2.6 and prove a more general statement in Theorem 2.2.17. In Section 2.3 we give a proof of the famous result by Jordan that a function is of bounded variation if and only if it can be written as the difference of two increasing functions, see Theorem 2.3.2. Therefore, the set of functions of bounded variation is the vector space induced by the monotone BV functions. Section 2.4 deals with the regularity properties of functions of bounded variation. We show that functions of bounded variation can have at most countably many discontinuities, that they are differentiable almost everywhere and Borel-measurable, see Theorem 2.4.2 and Theorem 2.4.6. Section 2.5 illustrates the connection between functions of bounded variations and measures. Indeed, there is a natural correspondence between right-continuous functions of bounded variation and finite signed Borel measures, see Theorem 2.5.5. It is a well-known result that the graph of functions of bounded variation has Hausdorff-dimension 1. In 2010, Liang proved in [38] that continuous functions of bounded variation also have Box- dimension 1, using Riemann-Liouville fractional integrals. We give a much more elementary proof of this fact in Section 2.6 and show that it is not necessary to require the function to be continuous, see Theorem 2.6.9. Kuller proved [35] that is a commutative Banach algebra with respect to pointwise multiplication. BV We give a proof of this fact in Section 2.7 (Theorem 2.7.8), and we also show Helly’s First Theorem, which tells us that the unit ball of satisfies some weaker form of compactness, see Theorem BV 1 Introduction 2

2.7.11. Consequently, in Section 2.8 we characterize the maximal ideal space of in Theorem BV 2.8.7. Finally, in Section 2.9 we give an application of functions of bounded variation to the study of Fourier series. In particular, we prove some famous results due to Jordan, including that the Fourier series of functions of bounded variation converges pointwise (although not necessarily exactly to the function itself) in Theorem 2.9.14, and that the Fourier series of a continuous function of bounded variation converges uniformly to the function itself in Theorem 2.9.23. The generalization of functions of bounded variation to the multidimensional setting is not immediately clear. Indeed, there are many different definitions. We study the variations in the sense of Vitali; Hardy and Krause; Arzela; Hahn; and Pierpont. We denote by , , , and the V HK A H P corresponding sets of functions of bounded variation. In Section 3.1 we define those variations and give examples of functions of bounded variation of the various kinds. Since we show in Theorem 3.5.1 that the variations in the sense of Hahn and Pierpont are equivalent, thus extending a similar two-dimensional result by Clarkson and Adams in [13], we rarely treat the Pierpont-variation in the following chapters. In Section 3.2, we again define the variation functions similarly as in the one-dimensional settings. Unfortunately, many results that hold for univariate functions do not extend to multivariate functions. However, in Theorem 3.2.6 we prove some previously unknown regularity correspondences similar to those in Theorem 2.2.5, especially for functions of bounded Arzelà-variation. In Section 3.3, we prove that , , and are vector spaces, and that , and are V HK A H HK A H closed under multiplication and division (given that the denominator is bounded away from 0), see Proposition 3.3.1 and Proposition 3.3.3. Most of those results were already known, although some of them had only been proved for bivariate functions. Similarly to the one-dimensional setting, we state monotone decomposition theorems for functions in , and in Theorem 3.4.1, Theorem 3.4.2 and Theorem 3.4.3, respectively. Naturally, since the A V HK various definitions of bounded variation do not coincide, we use different definitions of monotonicity in those theorems. We remark that those decompositions were already known, although some of them again only in two dimensions. In Section 3.5 we study the relations between the various kinds of bounded variation. We are able to extend some (but not all) previously known results from the two-dimensional setting to arbitrary dimensions in Theorem 3.5.1. Section 3.6 deals with the regularity properties of multivariate functions of bounded variation, where we again extend some results from the two-dimensional setting to arbitrary dimensions. In particular, we show that functions in , and are continuous almost everywhere (Theorem HK A H 3.6.1) and thus also Lebesgue-measurable, that functions in and are differentiable almost HK A everywhere (Theorem 3.6.13), and that functions in are Borel-measurable (Theorem 3.6.18). HK In Section 3.7 we state the correspondence between right-continuous functions in and finite HK signed Borel measures, which is the precise generalization of Theorem 2.5.5 to the multidimensional setting. Verma and Viswanathan proved in 2020 in [49] that the graph bivariate continuous functions of bounded Hahn-variation has Hausdorff- and Box-dimension 2. In Section 3.8, we extend this result to arbitrary dimensions and get rid of the continuity condition. All the higher-dimensional variations we consider are generalizations of the one-dimensional concept. In particular, for univariate functions the notions of variations are all equivalent. We show this already in Proposition 3.1.17. Therefore, one might hope that we can also prove that the variations coincide for product functions, i.e. functions that are the product of one-dimensional functions. 1 Introduction 3

Adams and Clarkson already noted in [1] that such connections exist for bivariate functions, although their statements were a bit imprecise and they oﬀered few proofs. Hence, we study those product functions is Section 3.9, and show in Corollary 3.9.10 that under rather weak conditions all kinds of variations are equivalent for product functions. Blümlinger and Tichy proved in [10] that is a Banach algebra with respect to pointwise multi- HK plication. We show that , and are also Banach algebras in Theorem 3.10.3, Theorem 3.10.8 A H P and Corollary 3.10.12, respectively. Finally, in Section 3.11 we study the maximal ideal space of the Banach algebras and . HK A Blümlinger already characterized the maximal ideal space of in [9], and we aim to do the same HK for . However, it turns out has far more maximal ideals than , which becomes especially A A HK clear in Proposition 3.11.12. Therefore, we were unsuccessful in obtaining a characterization. In Section 4 we study the Koksma-Hlawka inequality. The Koksma-Hlawka inequality bounds the error of approximating the integral of a function f : [0, 1]d R using the quadrature rule → 1 n f(x)dx f(xi) [0,1]d ≈ n Z Xi=1 for some point set := x ,...,x [0, 1]d. The Koksma-Hlawka inequality states that Pn { 1 n}⊂ 1 n D f(x)dx f(xi) VarHK(f)k k∞ , (1.1) d − n ≤ n Z[0,1] i=1 X

where VarHK (f) is the Hardy-Krause variation of f and D is the discrepancy function, which characterizes how well-distributed the point set is. Hence, we are able to split the error of Pn integration into a product of two factors, one only depending on the function and one only depending on the point set. However, there are many simple functions (like indicator functions of rotated boxes, see Example 3.1.9) that are not of bounded Hardy-Krause-variation. For those functions, the Koksma-Hlawka inequality is useless. Therefore, there have been many eﬀorts to prove a similar inequality using a less restrictive notion of variation. To this end, we also discuss more recent concepts like the Harman variation and the -variation. Moreover, in Theorem 4.3.1 we D prove a previously unknown inequality similar to (1.1) using the Hahn-variation, which is a lot less restrictive than the Hardy-Krause-variation. 2 Functions of one variable 4

2 Functions of one variable

We start by studying one-dimensional functions of bounded variation. The definition of those functions goes back to Jordan, see for example [30, 31], who studied functions of bounded variation in the late nineteenth century mainly in the context of Fourier series. Most results of this section are common knowledge. The main resources for writing this chapter were the books by Carothers ([12]), Folland ([18]), Royden ([44]), Rudin ([45]) and Yeh ([50]), as well as the book by Appell, Banaś and Merentes ([5]), which is a comprehensive introduction into functions of bounded variation. However, whenever possible, the mathematician who first proved a theorem was cited. First, we define the variation of a function and give many examples of functions of bounded variation. Next, we study the variation function which captures the variation on subintervals and shares many properties with its parent function. We also solve an open problem on the Hölder continuity of the variation function. Then, we study the monotone decomposition of functions of bounded variation, which gives us a useful tool for extending regularity properties like almost everywhere continuity and differentiability as well as Borel-measurability of monotone functions to functions of bounded variation. Furthermore, we investigate the connection between functions of bounded variation and signed Borel measures, as well as the Hausdorff and box dimension of the graph of functions of bounded variation, where we improve on a previously known result. Next, we study the functional analytic and algebraic structure of the space of functions of bounded variation, and, finally, we give an application to Fourier series.

2.1 Motivation and deﬁnition

Let γ : [a, b] R2 be a parametrization of a “nice” curve C = γ([a, b]). How can we deﬁne the → length of C? One possibility is to approximate C by a polygon with nodes a = t

n γ(tj) γ(tj 1) 2. (2.2) k − − k jX=1 Here, . denotes the Euclidean norm on R2. If we include more partition points and the curve k k2 C is smooth enough, the resulting polygons should approximate C better. It is thus reasonable to define the length of C as the limit (or better the supremum) as the partition gets finer and finer, i.e. n ℓ(C) := sup γ(tj) γ(tj 1) 2, k − − k jX=1 where the supremum is taken over all partitions. Similarly, one can interpret the graph of a function f : [a, b] R as a curve in two dimensions. The → variation of f then captures the vertical changes in the graph of f. To properly define this variation or vertical change of f, we introduce some notation. First, instead of partitions of an interval, we consider ladders. There are two basic differences. First, a ladder usually does not contain the end point b, whereas a partition does. Second, we treat a ladder as an unordered set. Thereby we get rid of an index, making the notation more readable in higher dimensions.

Definition 2.1.1. A ladder on the interval [a, b] is a finite subset of a (a, b) with a . In Y { }∪ ∈Y particular, b is in if and only if a = b. Y Let y . Then we define the successor y of y as the smallest element in larger than y. If there ∈Y + Y is no such element, we define y+ as b. Similarly, we define the predecessor y of y as the largest − 2 Functions of one variable 5 element in smaller than y. If there is no such element, we define y as a. Finally, we define the Y − predecessor b of b as the largest element of . − Y We denote by Y = Y[a, b]= Y(a, b)= Y( ) the set of ladders on . I I The somewhat awkward inclusion of the case a = b will be useful in higher dimensions. For the results of one-dimensional functions, however, we assume from now on that

Therefore, we define the variation of a function as the supremum over all variations on ladders, as finer ladders capture more of the oscillation of a function. Definition 2.1.3. For a function f : R with = [a, b], we define its total variation as I → I Var(f; ) := Var(f; a, b) := sup ∆Y f. I Y Y∈ If the interval is clear from the context, we also write Var(f) instead of Var(f; ). We say that I I f is of bounded (total) variation, if Var(f) < . Finally, we denote by := [a, b] the set of ∞ BV BV functions on [a, b] with bounded total variation. Example 2.1.4. Indicator functions of intervals are of bounded total variation. For example, if [c, d] (a, b), then the indicator function 1 has total variation 2. ⊂ [c,d] Example 2.1.5. Monotone functions are of bounded variation. If f is monotonically increasing, then

Var(f; a, b) = sup f(y+) f(y) = sup f(y+) f(y) = f(b) f(a) < . Y y | − | Y y − − ∞ Y∈ X∈Y Y∈ X∈Y The same holds for monotonically decreasing f, except for a sign change. 2 Functions of one variable 6

Example 2.1.6. Lipschitz continuous functions are also of bounded variation. Let f be a Lipschitz continuous function with Lipschitz constant L. Then

Var(f; a, b) = sup f(y+) f(y) sup L y+ y = L(b a) < . Y y | − |≤ Y y | − | − ∞ Y∈ X∈Y Y∈ X∈Y Example 2.1.7. Absolutely continuous functions are of bounded variation. Recall that a function f : [a, b] R is called absolutely continuous, if for all ε > 0 there exists a δ > 0, such that for all → finite families of disjoint open subintervals (a , b ),..., (a , b ) of [a, b] with n (b a ) δ, we { 1 1 n n } i=1 i − i ≤ have n P f(b ) f(a ) ε. | i − i |≤ Xi=1 To prove that such functions are of bounded variation, choose ε = 1 and take δ > 0 as in the definition of absolute continuity. Define the ladder k ∗ := y [a, b) : y = a + δ for some k N0 . Y ∈ 2 ∈ Clearly, ∗ contains Y 2(b a) n := − δ points. For a ladder Y we define the ladder := . Then, we define the ladders Y ∈ Y′ Y∪Y∗ k k + 1 := ′ a + δ, a + δ Yk Y ∩ 2 2 h k k+1 on a+ 2 δ, a+ 2 δ for k = 0,...,n. We apply the absolute continuity of f to the intervals induced byh the ladder oni a + k δ, a + k+1 δ and get Yk 2 2 h i n n 2(b a) f(y+) f(y) f(y+) f(y) = f(y+) f(y) 1= − < . − ≤ ′ − − ≤ δ ∞ y y k=0 y k k=0 X∈Y X∈Y X X∈Y X Thus, f is of bounded variation. Example 2.1.8. The length of a curve C with parametrization γ : [a, b] R2 and γ(t)= x(t),y(t) → can be analogously defined using ladders by

ℓ(C) := sup γ(y+) γ(y) 2. Y y − Y∈ X∈Y We call C rectifiable if ℓ(C) < . Then C is rectifiable if and only if both x and y are of bounded ∞ total variation. This illustrates that a curve is rectifiable, if and only if the horizontal and vertical variations of that curve are finite. The equivalence follows immediately from the observation that max x(t) x(s) , y(t) y(s) γ(t) γ(s) x(t) x(s) + y(t) y(s) . − − ≤ − 2 ≤ − − n o Example 2.1.9. If f : [a, b ] R is differentiable, then one can prove that → b Var(f; a, b) f ′(x) dx. (2.3) ≥ | | Za 1 Thus, the function f(x) = x sin(x− ) is of unbounded variation on [0, 1]. In particular, we show in Theorem 2.4.6 that functions of bounded variation are differentiable almost everywhere and also satisfy (2.3). Moreover, if f is absolutely continuous, we have equality in (2.3), as was shown in [5, Theorem 3.19]. Example 2.1.10. Other examples of functions of unbounded variation are the indicator function 1Q on [0, 1], or paths of Brownian motion, which are of unbounded variation with probability 1. 2 Functions of one variable 7

2.2 Variation functions

The variation function of a function f : [a, b] R captures the variation of f on all the intervals → [a, x] for x [a, b]. We show that variation functions are increasing and that they share many ∈ regularity properties with their parent functions. However, we also show that the variation function of a Hölder continuous function need not be Hölder continuous, solving an open problem.

Deﬁnition 2.2.1. The variation function Var : [a, b] [0, ] of a function f : [a, b] R is f → ∞ → deﬁned as Varf (x) := Var(f; a, x).

Conversely, f is called the parent function of Varf .

First, we note that variation functions are always increasing.

Proposition 2.2.2. Let f : R be a function and let c . Then I → ∈I Var(f; a, b) = Var(f; a, c) + Var(f; c, b).

In particular, for a x y b, ≤ ≤ ≤ Var (y) Var (x) = Var(f; x,y) 0, f − f ≥ and the variation function Varf is increasing.

Proof. If c = b, this is trivial. Assume that c < b and let be a ladder on . By Proposition 2.1.2 Y I we may assume that c . Deﬁne the ladders := y : y < c and := y : y c on ∈Y Y1 { ∈Y } Y2 { ∈Y ≥ } [a, c] and [c, b], respectively. Then

1 2 ∆Y (f; a, b)=∆Y (f; a, c)+∆Y (f; c, b).

Taking the supremum over all ladders Y[a, b] yields Y ∈ Var(f; a, b) Var(f; a, c) + Var(f; c, b), ≤ since we can assume without loss of generality that every ladder in Y[a, b] contains c. Conversely, let and be ladders on [a, c] and [c, b], respectively. Then := is a ladder Y1 Y2 Y Y1 ∪Y2 on [a, b] and 1 2 ∆Y (f; a, b)=∆Y (f; a, c)+∆Y (f; c, b). Taking the supremum over all ladders Y[a, c] and Y[c, b] yields Y1 ∈ Y2 ∈ Var(f; a, b) Var(f; a, c) + Var(f; c, b). ≥ This implies the desired equality.

We prove a slight generalization of the above Proposition.

Lemma 2.2.3. Let f : [a, b] R be a function with f(a+) = f(a). Let b = z > z > z >... be → 0 1 2 a strictly decreasing sequence in [a, b] that converges to a. Then

∞ Var(f; a, b)= Var(f; zn+1, zn). nX=0 2 Functions of one variable 8

Proof. First, the series converges (potentially to inﬁnity), since all the terms are non-negative. Applying Proposition 2.2.2, we have for k N that ∈ k 1 − Var(f; a, b) Var(f; z , z )= Var(f; z , z ). ≥ k 0 n+1 n nX=0 Taking k to inﬁnity yields ∞ Var(f; a, b) Var(f; z , z ). ≥ n+1 n nX=0 On the other hand, let ε > 0 and let be a ladder on [a, b]. Let k N be such that a < z < a Y ∈ k + and f(a) f(z ) < ε. Such a k exists, since z a and therefore, f(z ) f(a). Proposition − k k → k → 2.2.2 yields

f(y ) f(y) = f(y ) f(y) + f(a ) f(a) + − + − + − y y a X∈Y ∈Y\{X } Var(f; a , b)+ f(a ) f(z ) + f(z ) f(a) ≤ + + − k k − Var(f; a+, b) + Var(f; zk, a+)+ ε = Var(f; zk, b )+ ε ≤ k 1 − ∞ = Var(f; z , z )+ ε Var(f; z , z )+ ε. n+1 n ≤ n+1 n nX=0 nX=0 Since ε> 0 was arbitrary,

∞ f(y+) f(y) Var(f; zn+1, zn). y − ≤ n=0 X∈Y X

Taking the supremum over all ladders Y[a, b] yields Y ∈

∞ Var(f; a, b) Var(f; z , z ), ≤ n+1 n nX=0 which proves the lemma.

The variation function and its parent function share many regularity properties. To state these connections, we need some deﬁnitions.

Deﬁnition 2.2.4. Let (X, d) be a metric space. Let (X) denote the set of continuous functions C on X. A function f : X R is called Lipschitz continuous, if there exists a constant L> 0, such that for → all x ,x X, 1 2 ∈ d f(x ),f(x ) L x x . (2.4) 1 2 ≤ | 1 − 2| Furthermore, we denote by

d f(x1),f(x2) lip(f) := lip(f; X):= sup x1=x2 x1 x2 6 | − | the minimal Lipschitz constant L in (2.4). The set of Lipschitz continuous functions on X is denoted by Lip = Lip(X). 2 Functions of one variable 9

A function f : X R is called α-Hölder continuous (with 0 <α< 1), if there exists a constant → L> 0, such that for all x ,x X, 1 2 ∈ d f(x ),f(x ) L x x α. (2.5) 1 2 ≤ | 1 − 2| Furthermore, we denote by

d f(x1),f(x2) lipα(f) := lipα(f; X):= sup α x1=x2 x1 x2 6 | − | the minimal Hölder constant L in (2.5). The set of α-Hölder continuous functions on X is denoted by Lipα = Lipα(X). Let = [a, b] be an interval. Then a function f : R is called absolutely continuous if for I I → all ε > 0 there exists a δ > 0 such that for every finite sequence of pairwise disjoint intervals (x ,y ) that satisfies k k ⊂I yk xk < δ, k − X we have f(y ) f(x ) < ε. k − k k X The set of all absolutely continuous functions on [a, b] is denoted by AC = AC[a, b] = AC( ). I Finally, for an interval , 1( ) denotes the set of continuously differentiable real-valued functions. I C I We remind the reader that the inclusions

1 Lip Lip Lip C ⊆ ⊆ α ⊆ β ⊆C hold for α β. ≥ Before stating the connections between the variation function and its parent function, we deﬁne the left- and right-side limit of a function f at x0 as

f(x0 ) := lim f(x0 ε) and f(x0+) := lim f(x0 + ε), − ε 0 − ε 0 ↓ ↓ if they exist. The following theorem is a selection of statements due to Huggins [29] and Russell [46].

Theorem 2.2.5. Let f : [a, b] R be a function and let Var be its variation function. Then the → f following statements hold.

1. The function f is of bounded variation if and only if the function Varf is of bounded variation. Moreover, in this case we have Var(Varf ) = Var(f).

2. If f is of bounded variation, then f is (left-/right-)continuous if and only if Varf is (left- /right-)continuous.

3. If f is of bounded variation, then f is Lipschitz continuous if and only if Varf is Lipschitz continuous. Moreover, in this case we have lip(Varf ) = lip(f).

4. If f is of bounded variation, then f is α-Hölder continuous if Varf is α-Hölder continuous. Moreover, in this case we have lip (Var ) lip (f). α f ≥ α 2 Functions of one variable 10

5. If f is of bounded variation, then f is absolutely continuous if and only if Varf is absolutely continuous.

Proof. 1. First, let f be of bounded variation. Since Varf is increasing and Varf (a) = 0, it is easy to see that Var(Var ) = Var (b) Var (a) = Var (b) = Var(f), f f − f f implying that Varf is of bounded variation. Conversely, let Varf be of bounded variation. Then Var(f) = Var (b) < , as otherwise the variation of Var would be undeﬁned. f ∞ f 2. We prove that f is right-continuous if and only if Varf is right-continuous. Similarly, f is left- continuous if and only if Varf is left-continuous. Together, this shows that f is continuous if and only if Varf is continuous. Let f be right-continuous at x. Let ε> 0 be arbitrary and let δ > 0 be such that f(x) f(x+h) < | − | ε/2 for all 0 h < δ. Let Y[x, b] be such that ≤ Y0 ∈ f(y ) f(y) Var(f; x, b) ε/2. | + − |≥ − y 0 X∈Y Using Proposition 2.1.2, we can assume that there is a y with x

0 Var (y ) Var (x) = Var(f; a, y ) Var(f; a, x) = Var(f; x,y ) ≤ f 0 − f 0 − 0 = Var(f; x, b) Var(f; y , b) ε − 0 ≤ holds for all y (x,x + h). Thus, Var is right-continuous at x. 0 ∈ f On the other hand, if Varf is right-continuous at x, then it follows from Proposition 2.2.2 that

f(x + h) f(x) Var(f; x,x + h) = Var(f; a, x + h) Var(f; a, x) = Var (x + h) Var (x), | − |≤ − f − f which implies that f is right-continuous at x. 3. If f is Lipschitz continuous and a x y b, then ≤ ≤ ≤ Var (x) Var (y) = Var(f; x,y) lip(f) x y | f − f | ≤ | − | by Example 2.1.6. Conversely, if Var is Lipschitz continuous and a x y b, then f ≤ ≤ ≤ f(y) f(x) Var(f; x,y) = Var (y) Var (x) lip(Var ) y x . | − |≤ f − f ≤ f | − | 4. If Var is α-Hölder continuous and a x y b, then f ≤ ≤ ≤ f(y) f(x) Var(f; x,y) = Var (y) Var (x) lip (Var ) y x α. | − |≤ f − f ≤ α f | − | 5. Let f be absolutely continuous. Let ε > 0 and let δ > 0 be such that for all ﬁnite disjoint sequences of intervals (x ,y ) with k k ⊂I y x < δ (2.6) k − k k X 2 Functions of one variable 11 we have f(y ) f(x ) < ε. k − k k X

Let (x1,y1),..., (xn,yn) be a disjoint sequence of intervals satisfying (2.6). On the interval [xk,yk] we can ﬁnd a ladder = y ,...,y Yk k,1 k,mk with y

f(y+) f(y) + ε/n Var(f; xk,yk). y − ≥ X∈Yk

Then n n n ε Var (y ) Var (x ) = Var(f; x ,y ) f(y ) f(y) + 2ε f k − f k | k k ≤ + − n ≤ k=1 k=1 k=1 y k X X X X∈Y by the absolute continuity of f. This shows that Varf is absolutely continuous.

Conversely, assume that Varf is absolutely continuous. Let ε> 0 and let δ > 0 be such that for all ﬁnite disjoint sequences of intervals (x ,y ) with (2.6) we have k k ⊂I

Varf (yk) Varf (xk) < ε. k − X

Then f(yk) f(xk) Varf (yk) Varf (xk) < ε, k − |≤ k − X X implying that f is absolutely continuous.

Note the asymmetry in the fourth statement of the preceding theorem. In fact, it seems to be an open question whether the reverse direction holds (see [5, p. 80]). Here, we show with the following example that the reverse does not hold.

Example 2.2.6. Let 0 <α< 1. We construct a function f that is of bounded variation and α-Hölder continuous, such that Var is γ-Hölder continuous for no γ (0, 1). f ∈ First, consider the following general example. Let x > x > > 0 be a sequence with x 0 1 2 · · · n → and let (yn) be a sequence with y2 > y4 > > 0, y2n 1 = 0 for n N and yn 0. Deﬁne the · · · − ∈ → function f : [0,x ] R as f(x ) = y and interpolate linearly in between. An example of such a 1 → n n function is shown in the picture below. 2 Functions of one variable 12

y4 y6 y8 y10

0 x8 x7 x6 x5 x4 x3 x2 x1

α The blue graph is the function f on the interval [x11,x1], the red graph is the function x x . α 7→ The values y2n were chosen smaller than x2n in order to ensure that f is α-Hölder continuous at 0. It remains to choose the sequences (xn) and (yn) appropriately.

First, the variation function Varf is easy to determine. Using Lemma 2.2.3, we have

∞ Varf (x2n 1) = Var f; 0,x2n 1 = 2 y2k. − − k=n X We want that Var is γ-Hölder continuous for no γ (0, 1). In order to achieve this, we can choose f ∈ the sequence (yn) to be decreasing as slowly as possible. Since f should be of bounded variation, 1 however, it needs to fall faster than n− , as otherwise, the series diverges. Therefore, we set 1 y2n = . 2n log(n + 1) 2

With this choice f is of bounded variation since

∞ 1 Var(f) = Varf (x1)= < . n log(n + 1) 2 ∞ nX=1 Now we have to choose the sequence (xn). Its decay should be slow enough so that f is α-Hölder continuous, but fast enough so that Var is γ-Hölder continuous for no γ (0, 1). We set f ∈ β x2n 1 = n− − for an appropriate choice of β > 0 that remains to be determined, and

x2n 1 + x2n+1 x = − . 2n 2 2 Functions of one variable 13

First, note that

β ∞ 1 ∞ 1 1 Varf (n− ) = Varf (x2n 1)= 2 2 dx = . − n log(n + 1) ≥ n+1 x(log x) log(n + 1) kX=n Z Therefore, for γ (0, 1), we have ∈ β βγ Varf (x) Varf (n− ) n sup γ sup βγ sup = , x (0,x1] x ≥ n N n− ≥ n N log(n + 1) ∞ ∈ ∈ ∈ since βγ > 0. Hence, Varf is not γ-Hölder continuous regardless of our choice of β > 0. It remains to ensure that f is α-Hölder continuous. First, f needs to be α-Hölder continuous at 0, i.e.

1 αβ αβ f(x) f(x2n) 2n(log(n+1))2 (n + 2) (3n) sup α sup α sup αβ = sup 2 sup 2 x (0,x1] x ≤ n N x2n+3 ≤ n N (n + 2)− n N 2n(log(n + 1)) ≤ n N 2n(log 2) ∈ ∈ ∈ ∈ ∈ αβ 3 αβ 1 2 sup n − < . ≤ 2(log 2) n N ∞ ∈ Therefore, we choose β such that 0 < β α 1. ≤ − Second, due to the speciﬁc structure of f, it is apparent that

α 1 f(x) f(y) f(x ) 2 2 sup − = sup 2n = sup 2n(log(n+1)) α x2n−1 x2n+1 α β β α x,y (0,x1] x y n N − n N n− (n + 1)− ∈ | − | ∈ 2 ∈ − α−1 2 α 1 α(β+1) n(log 2)2 2 − (n + 1) sup β 1 α = 2 sup ≤ n N ((n + 1)− − ) (log 2) n N n ∈ ∈ 2α (n + 1)α(β+1) 2 sup ≤ (log 2) n N n + 1 α ∈ 2 α(β+1) 1 2 sup n − . ≤ (log 2) n N ∈ The last supremum is ﬁnite if α(β + 1) 1 0, i.e. if β α 1 1. Hence, f is α-Hölder continuous − ≤ ≤ − − if 1 1 1 0 < β min α− , α− 1 = α− 1. ≤ − − Since α< 1, the choice of such a β > 0 isn possible. Therefore,o the function f constructed this way is α-Hölder continuous, but Var is γ-Hölder continuous for no γ (0, 1). f ∈ We can greatly generalize the above result. To this end, we introduce moduli of continuity. Deﬁnition 2.2.7. A continuous, increasing function ω : [0, ) [0, ) with ω(0) = 0 is called ∞ → ∞ modulus of continuity.

We remark that this is not the most general definition used for moduli of continuity. Often, the requirement that ω is increasing is dropped and the continuity is replaced with continuity at zero. The reason for our more restrictive definition is to achieve simpler and clearer statements and better consistency with the coming definitions. Proposition 2.2.10 illustrates, however, that our definition is in some sense the most general one. Moduli of continuity are usually not used by themselves. Instead, they are helpful in characterizing how continuous a given function is. 2 Functions of one variable 14

Deﬁnition 2.2.8. Let R be a bounded or unbounded interval and let f : R be a function. I ⊂ I → A modulus of continuity ω is called a modulus of continuity for f if for all x,y , we have ∈I f(x) f(y) ω x y . − ≤ | − | Examples of moduli of continuity are x Lx and x Lxα for 0 < α 1. They character- 7→ 7→ ≤ ize the Lipschitz and the α-Hölder continuous functions with Lipschitz and α-Hölder constant L, respectively. It is easy to see that given a function f and two moduli of continuity ω ω , if ω is a modulus 1 ≤ 2 1 of continuity for f, so is ω2. In that sense, larger moduli of continuity represent weaker continuity conditions. In particular, to every continuous function we can associate its minimal modulus of continuity.

Deﬁnition 2.2.9. Let R be a bounded or unbounded interval and let f : R be a continuous I ⊂ I → function. The minimal modulus of continuity of f is deﬁned as

ω (h) := sup f(x) f(y) : x,y , x y h . f − ∈I | − |≤ n o

We state the following facts about minimal moduli of continuity.

Proposition 2.2.10. Let f : [a, b] R be a continuous function. Then ω is a modulus of → f continuity for f, is subadditive and satisﬁes ωωf = ωf . Moreover, if ω is a modulus of continuity for f, then ω ω. f ≤

Proof. It is obvious from the deﬁnition that ωf (0) = 0 and that ωf is increasing. Furthermore, note that ω (h) is ﬁnite for all h [0, ). This is because f is continuous on the compact set [a, b], and f ∈ ∞ hence bounded. We show that ω is subadditive. Let s,t 0. Then f ≥ ω (s + t) = sup f(x) f(y) : x,y [a, b], x y s + t f − ∈ | − |≤ n o = sup f(x) f(z)+ f(z) f(y) : x,y,z [a, b], x z s, z y t − − ∈ | − |≤ | − |≤ n o sup f(x) f(z) + f(z) f(y ) : x,y,z [a, b], x z s, z y t ≤ − − ∈ | − |≤ | − |≤ n o sup f(x) f(z) : x, z [a, b], x z s ≤ − ∈ | − |≤ n o + sup f(z) f(y ) : y, z [a, b], y z t − ∈ | − |≤ n o = ωf (s)+ ωf (t).

Next, we show that ωf is continuous at zero. Since f is continuous on the compact set [a, b], it is uniformly continuous. Hence, for all ε > 0 there exists a δ > 0 such that f(x) f(y) ε for all − ≤ x y δ with x,y [a, b]. In particular, ωf (δ) ε. Since ε was arbitrary and ωf is increasing, | − | ≤ ∈ ≤ we have ωf (0+) = ωf (0) = 0.

Now we prove that ωf is continuous everywhere. Let t,h > 0. Since ωf is subadditive and increasing,

ω (t) ω (t + h) ω (t)+ ω (h). f ≤ f ≤ f f

Taking h to zero and using that ωf (0+) = 0 yields that ωf is right-continuous. The left-continuity of ωf follows similarly from

ω (t) ω (t h)+ ω (h) ω (t)+ ω (h). f ≤ f − f ≤ f f 2 Functions of one variable 15

Altogether, ωf is continuous.

We have shown that ωf is a modulus of continuity. Now it is trivial that ωf is also a modulus of continuity for f. To show that ω = ω , let h 0. Since ω is increasing, ωf f ≥ f ω (h) = sup ω (x) ω (y) : x,y 0, x y h ωf f − f ≥ | − |≤ n o = sup ω (x + h) ω (x): x 0 ω (0 + h) ω (0) = ω (h). f − f ≥ ≥ f − f f n o On the other hand, since ωf is subadditive,

ω (h) = sup ω (x + h) ω (x): x 0 sup ω (x)+ ω (h) ω (x): x 0 = ω (h). ωf f − f ≥ ≤ f f − f ≥ f n o n o Finally, let ω be another modulus of continuity for f. If there exists an h 0 with ω(h) <ω (h), ≥ f then there are two points x,y [a, b] with x y h and f(x) f(y) > ω(h). Since ω is a ∈ | − | ≤ − modulus of continuity for f, and since ω is increasing,

ω(h) < f(x) f(y) ω x y ω(h), − ≤ | − | ≤ a contradiction.

The fourth statement of Theorem 2.2.5 can be easily generalized to moduli of continuity. Proposition 2.2.11. Let f : [a, b] R be a continuous function of bounded variation. Then → ω ω . f ≤ Varf

Proof. Since f is continuous and of bounded variation, also Varf is continuous by Theorem 2.2.5. Therefore, ω is well-deﬁned. Now, for a x y b with y x h we have with Proposition Varf ≤ ≤ ≤ − ≤ 2.2.2 that

f(y) f(x) Var(f; x,y) = Var (y) Var (x) ω (y x) ω (h). − ≤ f − f ≤ Varf − ≤ Varf

Taking the supremum over all x and y as above yields ω (h) ωVar (h). f ≤ f Our goal is to show that the converse of Proposition 2.2.11 does not hold. In fact, given two (almost arbitrary) moduli of continuity ω,ω′, we show that there exists a function f of bounded variation with ω ω but ω ω . f ≤ Varf ≥ ′ We require a modulus of continuity to be increasing and continuous. However, we need additional regularity properties. The following lemmas show that we can assume those regularity properties without loss of generality. Lemma 2.2.12. Let ω be a bounded modulus of continuity. Then there exists a modulus of continuity ω ω with ω (h)= ω (1) for all h 1. ′ ≥ ′ ′ ≥ Proof. Clearly, the function

ω(h)+ ω ω(1) h h [0, 1] ω′(h)= k k∞ − ∈ ( ω h (1, ). k k∞ ∈ ∞ is a modulus of continuity, ω ω, and ω (h)= ω (1) for h 1. ′ ≥ ′ ′ ≥ Lemma 2.2.13. Let ω be a modulus of continuity with ω(h)= ω(1) for h 1. Then ω ω, and ≥ ω ≥ ω (h)= ω (1) for h 1. ω ω ≥ 2 Functions of one variable 16

Proof. First, for h 0 we have ≥ ω (h) = sup ω(x) ω(y) : x,y 0, x y h ω(h) ω(0) = ω(h). ω − ≥ | − |≤ ≥ − n o

Second, notice that 0 = ω(0) ω(h) ω(1) for all h 0, since ω is increasing. Hence, ≤ ≤ ≥ ω (h) = sup ω(x) ω(y) : x,y 0, x y h ω(1) ω(0) = ω(1). ω − ≥ | − |≤ ≤ − n o On the other hand, for h 1, ≥ ω (h) = sup ω(x) ω(y) : x,y 0, x y h ω(1) ω(0) = ω(1). ω − ≥ | − |≤ ≥ − n o Hence, ω (h)= ω (1) = ω (1) for h 1 . ω ω ≥ Lemma 2.2.14. Let ω be a modulus of continuity that satisﬁes ω = ω and ω(h)= ω(1) for h 1. ω ≥ Then there exists a concave modulus of continuity ω with ω ω, ω ′ = ω and ω (h) = ω (1) for ′ ′ ≥ ω ′ ′ ′ h 1. ≥

Proof. Deﬁne ω′ as the concave majorant of ω, i.e.

ω′(h) := inf αh + β : αt + β ω(t) for all t 0 . ≥ ≥ n o Clearly, ω ω. In particular, ω is non-negative and ω (h) ω(h) = ω(1) for h 1. Also, since ′ ≥ ′ ′ ≥ ≥ ω(1) ω(t) for all t 0, ω (h) ω(1) for all h 0. Therefore, ω (h)= ω (1) = ω(1) for all h 1. ≥ ≥ ′ ≤ ≥ ′ ′ ≥ We show that ω (0) = 0. If ω(h) = 0 for all h 0, this is trivial. Otherwise, for all ε 0,ω(1) ′ ≥ ∈ there exists a δ > 0 such that ω(h) < ε for h δ, since ω(0+) = ω(0) = 0. Define ≤ ω(1) ε α = − . δ Then αt + ε ω(t) for all t 0. Since ε> 0 was arbitrary, ω (0) = 0. ≥ ≥ ′ Next, we show that ω′ is increasing. Since ω is non-negative, we can restrict the infimum in the definition of ω′ to non-negative values of α (negative values of α lead to negative values of αt + β for t sufficiently large). Let t,h,ε > 0 and let α 0, β R be such that ≥ ∈ ω(s) αs + β for all s 0 ≤ ≥ and ω′(t + h) α(t + h)+ β ε. ≥ − Then, ω′(t) αt + β α(t + h)+ β ω′(t + h)+ ε. ≤ ≤ ≤ Since ε> 0 was arbitrary, we have ω (t) ω (t + h), and ω is increasing. ′ ≤ ′ ′ Now we show that ω is continuous. Let t 0. Since ω is concave, ′ ≥ ′

ω′ λt + (1 λ)x λω′(t) + (1 λ)ω′(x) − ≥ − for λ [0, 1]. Taking x = 0 and letting λ tend to one, we have ∈

ω′(t ) ω′(t), − ≥ 2 Functions of one variable 17 at least if t = 0. Since ω is increasing, ω (t )= ω (t). On the other hand, 6 ′ ′ − ′ λ2 λ2 ω′(t)= ω′ λ(t λ) + (1 λ) t + λω′(t λ) + (1 λ)ω′ t + . − − 1 λ ≥ − − 1 λ − − Taking λ to zero yields ω′(t) ω′(t+). ≥ Again since ω′ is increasing, ω′(t+) = ω′(t)= ω′(t ). − In particular, ω′ is continuous.

It remains to show that ωω′ = ω′. We show that ω′ is subadditive, the proof is then analogous to the proof of Proposition 2.2.10. Since ω′ is concave, we have

ω′(λx)= ω′ λx + (1 λ)0 λω′(x) + (1 λ)ω′(0) = λω′(x) − ≥ − for x 0, λ [0, 1]. Let s,t 0. Then ≥ ∈ ≥ s t s t ω′(s + t)= ω′(s + t)+ ω′(s + t) ω′ (s + t) + ω′ (s + t) = ω′(s)+ ω′(t). s + t s + t ≤ s + t s + t

We mainly exploit the following property of concave functions.

Lemma 2.2.15. Let be a bounded or unbounded interval, and let g : R be a concave function. I I → Let x,y,x + h, y + h with x y and h 0. Then ∈I ≥ ≥ g(x + h) g(x) g(y + h) g(y). − ≤ − Proof. By the deﬁnition of concavity, the graph of g on the interval [y,y + h] lies “above” the secant

g(y + h) g(y) s(t) := g(y) + (t y) − . − h Indeed, y + h t t y y + h t t y s(t)= − g(y)+ − g(y + h) g − y + − (y + h) = g(t) h h h h ≤ for t [y,y + h]. We show that on [y,y + h] the graph of g lies “below” the secant s. ∈ I\ Suppose that there exists a t [y,y + h] such that g(t) > s(t), and assume without loss of ∈ I\ generality that t>y + h. Let u (y,y + h) and let λ [0, 1] be such that ∈ ∈ y + h = λt + (1 λ)u. − Then s(y + h)= λs(t) + (1 λ)s(u) < λg(t) + (1 λ)g(u) g(y + h)= s(y + h), − − ≤ a contradiction. To prove the statement of the lemma, we distinguish two different cases. First, assume that y ≤ x y + h x + h. Let s be defined as above. Since s is affine, ≤ ≤ g(y + h) g(y)= s(y + h) s(y)= s(x + h) s(x) g(x + h) g(x). − − − ≥ − 2 Functions of one variable 18

On the other hand, assume that y y + h x x + h. Inductively applying the ﬁrst case, we have ≤ ≤ ≤ g(y + h) g(y) g(y + 2h) g(y + h) g(y + 3h) g(y + 2h) . . . − ≥ − ≥ − ≥ For some k N, we have y + kh x y + (k + 1)h x + h. Again, we apply the ﬁrst case and ∈ ≤ ≤ ≤ have g(y + h) g(y) g(y + (k + 1)h) g(y + kh) g(x + h) g(x). − ≥ − ≥ −

Finally, we prove that concave functions are almost Lipschitz continuous. Lemma 2.2.16. Let g : [0, 1] R be a concave increasing function. Then g is Lipschitz continuous → on all intervals [ε, 1] with ε (0, 1). ∈ Proof. Let ε (0, 1) and let s be the secant through the points 0, g(0) and ε, g(ε) . We write ∈ s(t)= αt + g(0) for the correct value of α. In the proof of Lemma 2.2.15, we have shown that s(t) g(t) for t [0, ε] ≤ ∈ and s(t) g(t) for t [ε, 1]. ≥ ∈ Let ε x y 1 and let s be the secant through the points 0, g(0) and x, g(x) . Again write ≤ ≤ ≤ ′ s′(t)= α′t + g(0) for the correct value of α′. Since g is concave,

s′(ε) g(ε)= s(ε). ≤ Therefore, 0 α α. Since g is increasing and concave, ≤ ′ ≤ g(y) g(x) = g(y) g(x) s′(y) s′(x)= α′(y x) α y x . − − ≤ − − ≤ | − |

Hence, g is Lipschitz continuous with Lipschitz constant α on [ε, 1].

We now prove that we cannot make any reasonable conclusion on the modulus of continuity of the variation function if we only know the modulus of continuity of the parent function.

Theorem 2.2.17. Let ω,ω′ be two moduli of continuity such that ω(h) lim = , h 0 h ∞ → and ω is bounded. Then there exists a function f : [0, 1] R of bounded variation such that ω ω ′ → f ≤ and ω ω . Varf ≥ ′ Remark 2.2.18. The condition on ω is necessary, as otherwise f is Lipschitz continuous, which again implies that Varf is Lipschitz continuous by Theorem 2.2.5. The condition on ω′ is necessary, since f needs to be of bounded variation, and thus Varf and ωVarf are bounded as well.

Proof. Using Lemma 2.2.12, Lemma 2.2.13 and Lemma 2.2.14, we can assume without loss of generality that ω (h)= ω (1) for h 1, ω ′ = ω , and ω is concave. ′ ′ ≥ ω ′ ′ Deﬁne the function V : [0, 1] R, V (x) = ω (x). Then ω = ω . We inductively construct a → ′ V ′ non-negative function f on the intervals [x ,x ], [x ,x ],... with x = 1 and x 0, such that ω 1 0 2 1 0 n → is a modulus of continuity for f and Varf = V . 2 Functions of one variable 19

Assume we have already constructed f on the interval [xn, 1]. If xn = 0, we have already deﬁned f on the entire interval [0, 1]. Otherwise, we deﬁne xn+1 and construct f on the interval [xn+1,xn]. First, to every point x [0,x ], we assign a point y [x,x ] with the property that ∈ n x ∈ n V (x)+ V (x ) V (y )= n . x 2

Such a point yx exists, since V is increasing and continuous. Deﬁne the set

A := x [0,x ]: V (x + h) V (x) ω(h) for all h [0,y x] . n+1 ∈ n − ≤ ∈ x − n o Since both V and ω are continuous, the set An+1 is closed, and thus compact. It is non-empty since x A . Therefore, n ∈ n+1 x := inf A A . n+1 n+1 ∈ n+1 Furthermore, we deﬁne

yn+1 := yxn+1 .

Finally, we deﬁne the function f on [xn+1,xn] as

V (z) V (x ) z [x ,y ] f(z)= − n+1 ∈ n+1 n+1 (V (x ) V (z) z [y ,x ]. n − ∈ n+1 n

We note some simple facts about the function f. We always have f(xn)=0 and

V (xn 1) V (xn) f(y )= − − . n 2 Since V is continuous, f is continuous where it is deﬁned. Since V is increasing, f is piecewise monotone; f is increasing on the intervals [xn,yn] and decreasing on the intervals [yn+1,xn]. Since V is concave, f is concave on the intervals [xn,yn] and convex on the intervals [yn+1,xn].

0.95

0.70

0.45

0.25

0 xn yxn = yn xn 1 − 2 Functions of one variable 20

The above picture shows such a function f on the interval [xn,xn 1]. The red function is the − variation function V , the blue function is the parent function f. On the interval [xn,yn], f(z) = V (z)+ c, and on the interval [yn,xn 1], f(z)= V (z)+ c′. This construction already suggests that − − Varf = V . The constants c and c′ are chosen such that f(xn) = f(xn 1) = 0, and the point yn is − chosen such that f is continuous. The point xn is chosen such that ω is a modulus of continuity for f (a priori at least on the interval [xn,yn]).

The remaining proof is split into four steps. First, we show that (xn) converges to zero. Hence, we have deﬁned the function f on the interval (0, 1]. Second, we prove that f(0+) = 0, and, therefore, extend f continuously to [0, 1] with f(0) = 0. Then, we show that ω ω and ﬁnally, we prove f ≤ that Varf = V .

1. Clearly, (xn) is decreasing and bounded from below by zero. Thus, (xn) converges, say to the point x [0, 1]. Assume that x = 0. Since V is concave, it is Lipschitz continuous with constant L ∈ 6 on [x/2, 1] by Lemma 2.2.16. Since ω(h) lim = , h 0 h ∞ → there exists an ε> 0 such that ω(h) Lh for all h [0, ε]. Let n N be sufficiently large such that ≥ ∈ ∈ 0 x x ε/2. Define z := max x/2,x ε x/2, 1 . Then ≤ − n ≤ n+1 n − ∈ V (z + h) V (z ) Lh ω(h) n+1 − n+1 ≤ ≤ for h [0, ε]. Hence, z A and x = min A z < x, a contradiction. Therefore, ∈ n+1 ∈ n+1 n+1 n+1 ≤ n+1 (xn) converges to zero. In particular, we have also shown that (xn) is strictly decreasing. 2. If the sequence (x ) is finite, this statement is trivial, since then x = 0 for some n N. If(x ) n n ∈ n is infinite, it suffices to show that f(yn) converges to zero. Suppose this is not the case. Then there exists an ε> 0 such that f(y ) ε for infinitely many n N. Let (y ) be a subsequence of (y ) n ≥ ∈ nk k n with f(y ) ε. Since V is increasing, nk ≥ k k V (1) V (0) V (y ) V (x ) V (y ) V (x ) = f(y ) kε − ≥ n1 − nk ≥ nj − nj nj ≥ jX=1 jX=1 for all k N, a contradiction. Hence, f(0+) = 0 and we extend f continuously to [0, 1] with ∈ f(0) = 0. 3. Let h 0. We show that ω (h) ω(h), i.e. ≥ f ≤ sup f(x) f(y) : x,y [0, 1], x y h ω(h). − ∈ | − |≤ ≤ n o Since ω is increasing, it suffices to show that

sup f(x) f(y) : x,y [0, 1], x y = h ω(h). − ∈ | − | ≤ n o This in turn is equivalent to

sup f(x + h) f(x) : x [0, 1 h] ω(h). − ∈ − ≤ n o Let x [0, 1 h]. It remains to show that ∈ − f(x + h) f(x) ω(h). − ≤

We also write y instead of x + h. We distinguish several diﬀerent cases depending on the positions of x and y relative to the points x and y . To every point z (0, 1], we can assign n(z) N such n n ∈ ∈ that xn(z) < z xn(z) 1. The special case x = 0 is treated at the very end as Case 3. ≤ − 2 Functions of one variable 21

Case 1. We have n := n(x) = n(y). We distinguish whether x,y are in the intervals [xn,yn] or [yn,xn 1]. − Case 1.1. We have x,y [x ,y ]. Using Lemma 2.2.15, ∈ n n f(y) f(x) = V (y) V (x ) V (x) V (x ) = V (y) V (x) − − n − − n −

= V (y) V (x) V (x + h) V (x ) ω(h). − ≤ n − n ≤

Case 1.2. We have x,y [yn,xn 1]. Here, we need an additional distinction on the distance ∈ − h = y x. − Case 1.2.1. Assume that h y x . Using Lemma 2.2.15 and Case 1.1, ≤ n − n

f(y) f(x) = V (xn 1) V (y) V (xn 1) V (x) = V (x) V (y) − − − − − − −

= V (x + h) V (x) V (x + h) V (x ) ω(h). − ≤ n − n ≤ Case 1.2.2. Assume that h y x . Using the deﬁning property of y , ≥ n − n n

f(y) f(x) = V (xn 1) V (y) V (xn 1) V (x) = V (x) V (y) − − − − − − −

= V (y) V (x) V (xn 1) V (yn)= V (yn) V (xn) − ≤ − − − ω(y x ) ω(h). ≤ n − n ≤

Case 1.3. We have x [xn,yn] and y [yn,xn 1]. Using the preceding cases, ∈ ∈ − f(y) f(x) max f(y) f(y ) , f(x) f(y ) − ≤ − n − n n o max ω(y y ),ω(y x) ≤ − n n − n o = ω max y y ,y x ω(h). − n n − ≤ Case 2. We have m := n(y) < n(x) =: n. We again distinguish several diﬀerent cases and reduce them all to Case 1. Case 2.1. We have x [x ,y ]. ∈ n n Case 2.1.1. We have y [x ,y ]. ∈ m m Case 2.1.1.1. We have f(x) f(y). Then, ≤ f(y) f(x) = f(y) f(x) f(y)= f(y) f(x ) − − ≤ − m = f(y) f(xm) ω(y xm) ω(y x)= ω(h). − ≤ − ≤ −

Case 2.1.1.2. We have f(y) f(x ). Then, ≤ f(y) f(x) = f(x) f(y) f(x)= f(x) f(xn 1)= f(x) f(xn 1) − − ≤ − − − − ω(xn 1 x) ω(y x)= ω(h). ≤ − − ≤ −

Case 2.1.2. We have y [ym,xm 1]. ∈ − Case 2.1.2.1. We have f(x) f(y). Then, ≤ f(y) f(x) = f(y) f(x) f(y)= f(y) f(x ) − − ≤ − m = f(y) f(xm) ω(y xm) ω(y x) ω(h). − ≤ − ≤ − ≤

2 Functions of one variable 22

Case 2.1.2.2. We have f(y) f(x). Then, ≤

f(y) f(x) = f(x) f(y) f(x)= f(x) f(xn 1)= f(x) f(xn 1) − − ≤ − − − − ω(xn 1 x) ω(y x)= ω(h). ≤ − − ≤ −

Case 2.2. We have x [yn,xn 1]. ∈ − Case 2.2.1. We have y [x ,y ]. ∈ m m Case 2.2.1.1. We have f(x) f(y). Then, ≤ f(y) f(x) = f(y) f(x) f(y)= f(y) f(x ) − − ≤ − m = f(y) f(xm) ω(y xm) ω(y x)= ω(h). − ≤ − ≤ −

Case 2.2.1.2. We have f(y) f(x ). Then, ≤

f(y) f(x) = f(x) f(y) f(x)= f(x) f(xn 1)= f(x) f(xn 1) − − ≤ − − − − ω(xn 1 x) ω(y x)= ω(h). ≤ − − ≤ −

Case 2.2.2. We have y [ym,xm 1]. ∈ − Case 2.2.2.1. We have f(x) f(y). Then, ≤ f(y) f(x) = f(y) f(x) f(y)= f(y) f(x ) − − ≤ − m = f(y) f(xm) ω(y xm) ω(y x)= ω(h). − ≤ − ≤ −

Case 2.2.2.2. We have f(y) f(x ). Then, ≤

f(y) f(x) = f(x) f(y) f(x)= f(x) f(xn 1)= f(x) f(xn 1) − − ≤ − − − − ω(xn 1 x) ω(y x)= ω(h). ≤ − − ≤ − Case 3. We have x = 0. Deﬁne n := n(h)= n(y). Then,

f(y) f(x) = f(h)= f(h) f(x )= f(h) f(x ) ω(h x ) ω(h). − − n − n ≤ − n ≤

4. Using Lemma 2.2.3 and that f is continuous at zero and piecewise monotone, we have for x [0, 1] ∈ with x x y that n ≤ ≤ n ∞ Varf (x) = Var(f; 0,x)= Var f; xk+1,yk+1 + Var f; yk+1,xk + Var f; xn,x k=n X ∞ = f(y ) f(x )+ f(y ) f(x ) + f(x) f(x ) k+1 − k+1 k+1 − k − n kX=n ∞ ∞ V (x ) V (x ) = 2 f(y )+ f(x) = 2 k − k+1 + V (x) V (x ) k+1 2 − n kX=n kX=n = lim V (xk+1)+ V (xn)+ V (x) V (xn)= V (x). − k − →∞ 2 Functions of one variable 23

Similarly, for y x x , we have n+1 ≤ ≤ n Varf (x) = Var(f; 0,x)

∞ = Var f; xk+1,yk+1 + Var f; yk+1,xk k=n+1 X + Var f; xn+1,yn+1 + Var f; yn+1,x ∞ = f(y ) f(x )+ f(y ) f(x ) k+1 − k+1 k+1 − k k=Xn+1 + f(y ) f(x )+ f(y ) f(x) n+1 − n+1 n+1 − ∞ ∞ V (x ) V (x ) = 2 f(y ) f(x) = 2 k − k+1 V (x ) V (x) k+1 − 2 − n − k=n k=n X X = lim V (xk+1)+ V (xn) V (xn)+ V (x)= V (x). − k − →∞

2.3 Decomposition into monotone functions

The main result of this section is that we can decompose functions of bounded variation into the difference of two monotone functions. We can even state such a decomposition explicitly. Throughout this section, we only consider functions defined on a fixed interval = [a, b]. I In Example 2.1.5 we have seen that monotone functions are of bounded total variation. It is easily seen that linear combinations of functions of bounded variation are again of bounded variation. Proposition 2.3.1. If f and g are of bounded variation and if α, β R, then ∈ Var(αf + βg) α Var(f)+ β Var(g). ≤ | | | | In particular, the set ( ) is a vector space. BV I Proof. Let f, g and let α, β R. Then ∈ BV ∈ Var(αf + βg) = sup (αf + βg)(y+) (αf + βg)(y) Y y | − | Y∈ X∈Y = sup αf(y+) αf(y)+ βg(y+) βg(y) Y y | − − | Y∈ X∈Y sup α f(y+) f(y) + β g(y+) g(y) ≤ Y y | || − | | || − | Y∈ X∈Y α sup f(y+) f(y) + β sup g(y+) g(y) ≤ | | Y y | − | | | Y y | − | Y∈ X∈Y Y∈ X∈Y = α Var(f)+ β Var(g) < . | | | | ∞

Thus, the difference of two monotone functions is again of bounded variation. The following theorem states that the converse is also true, i.e. all functions of bounded variation can be written as the difference of two increasing functions. This theorem is of fundamental importance, since it enables us to extend many results for monotone functions to functions of bounded variation. It is also called the Jordan Decomposition Theorem and is due to Jordan, who was the first to introduce functions of bounded variation (see for example [45]). 2 Functions of one variable 24

Theorem 2.3.2 (Jordan Decomposition Theorem). If f : [a, b] R is of bounded variation, then → there are increasing functions f +,f : [a, b] R with f +(a)= f (a) = 0 and − → − + f(x) f(a)= f (x) f −(x) (2.7) − + − Varf (x)= f (x)+ f −(x).

Furthermore, this decomposition is unique and the functions satisfy

+ + + Var(f f −) = Var(f + f −) = Var(f ) + Var(f −) = Var(f) = Var(Var ). − f + If f is right-continuous, then also f and f − are right-continuous. Similar statements hold for left-continuous and continuous f.

Proof. We can reformulate the equations (2.7) as 1 f +(x)= (Var (x)+ f(x) f(a)) 2 f − 1 f −(x)= (Var (x) f(x)+ f(a)). 2 f −

The uniqueness is apparent from this representation and the claims about the continuity follow from Theorem 2.2.5. It remains to show that f + and f are increasing. We show that Var (x) f(x) is increasing. Take − f ± ε> 0 and x ,x [a, b] with x

f(y+) f(y) Varf (x1) ε. y | − |≥ − X∈Y Then

Varf (x2) f(x2) f(y+) f(y) + f(x2) f(x1) f(x2) ± ≥ y | − | | − |± X∈Y Var (x ) ε + f(x ) f(x ) (f(x ) f(x )) f(x ) ≥ f 1 − | 2 − 1 |± 2 − 1 ± 1 Var (x ) ε f(x ). ≥ f 1 − ± 1 Since ε> 0 is arbitrary, we get Var (x ) f(x ) Var (x ) f(x ). f 2 ± 2 ≥ f 1 ± 1 + Finally, Var(f) = Var(f f(a)) = Var(f f −). Furthermore, by Proposition 2.2.2, Var(f) = + − + − + Var(Varf ) = Var(f + f −). Since f and f − are increasing, also f + f − is increasing and thus

+ + + + + + Var(f +f −) = (f +f −)(b) (f +f −)(a) = (f (b) f (a))+(f −(b) f −(a)) = Var(f )+Var(f −). − − −

+ Remark 2.3.3. The functions f and f − in the above theorem are called the positive and negative variation functions of f, respectively. Notice that the Jordan Decomposition Theorem implies that is the linear hull of the monotone functions (which do not form a vector space on their own). BV

2.4 Continuity, diﬀerentiability and measurability

We have seen in Example 2.1.9 that there are differentiable functions that are of unbounded variation. On the other hand, Example 2.1.4 shows us that there are discontinuous functions that are of bounded variation. In light of these examples, we want to examine the connection between bounded 2 Functions of one variable 25 variation and continuity and differentiability more closely. All proofs in this chapter use the monotone decomposition of functions of bounded variation. So we always first prove the corresponding statements for increasing functions, and then transfer them to functions of bounded variation. First, recall that points of discontinuity of a function can be classified into different types. The two most interesting types for our studies are the removable discontinuities and the discontinuities of jump type. A function f has a removable discontinuity at x , if f(x ) and f(x +) exist and are 0 0− 0 finite, and if f(x )= f(x +) = f(x ). Those discontinuities are called removable, since they can 0− 0 6 0 be removed by redefining f at x as f(x ) (or f(x +)). A discontinuity of f at x is of jump 0 0− 0 0 type, if f(x +) and f(x ) exist and are finite, but different. Other types of discontinuities are, 0 0− for example, infinite discontinuities (when the function blows up) or mixed discontinuities (when at least one of the one-sided limits does not exist).

Lemma 2.4.1. Let f : [a, b] R be an increasing function. Then the number of discontinuities of → f is countable and they are all of jump type. Furthermore, f is Borel-measurable.

Proof. Let f be discontinuous at x0. Since f is increasing and bounded, the limits

f(x0 ) = lim f(x0 ε) and f(x0+) = lim f(x0 + ε) − ε 0 − ε 0 ↓ ↓ exist and are ﬁnite. If the limits are equal, then f has a removable discontinuity at x0. It is easily seen that this is a contradiction to the monotonicity of f. Therefore, the discontinuity is of jump type. Again since f is increasing, we have f(x ) < f(x +). Since there is a diﬀerent rational 0− 0 number in all the intervals f(x ),f(x+) when x is a discontinuity of f, the set of discontinuities − must be countable. The measurability follows immediately from the fact that the sets f 1(( , α)) are intervals. − −∞ Theorem 2.4.2. Functions of bounded variation have at most countably many discontinuities. Those discontinuities are removable or of jump type. Furthermore, functions of bounded variation are Borel-measurable.

Proof of Theorem 2.4.2. By Theorem 2.3.2 we can write functions of bounded variation as the difference of two monotonically increasing functions. By Lemma 2.4.1, those functions only have a countable number of discontinuities and are Borel-measurable. Thus, also functions of bounded variation can only have a countable number of discontinuities and are also Borel-measurable. Fur- thermore, since one-sided limits of increasing functions exist, they also exist for functions of bounded variation, proving that the points of discontinuity are either removable or of jump type.

Next, we show that functions of bounded variation are diﬀerentiable almost everywhere. In our proof, we follow Royden [44] closely.

Definition 2.4.3. Let A R be a set and let be a collection of non-degenerate intervals (i.e. we ⊂ J only consider intervals with infinitely many points) covering A. Then is called a Vitali-cover of J A if for all x A and ε> 0 there exists an interval I such that x I and λ(I) < ε. ∈ ∈J ∈ Lemma 2.4.4 (Vitali Covering Lemma). Let A R be a set of finite outer measure and let be a ⊂ J Vitali-cover of A. Then for all ε> 0 there exists a finite collection I1,...,In of pairwise disjoint intervals in such that J n λ∗ A I < ε. \ i i[=1 2 Functions of one variable 26

Proof. We can assume without loss of generality that all the intervals in are closed. Otherwise, J we replace them by their closure and note that the set of the endpoints of I1,...,In has measure zero. Let U be an open set of ﬁnite measure containing A. Since is a Vitali-cover, we assume without J loss of generality that U contains all the intervals in . We construct the sequence I ,...,I J 1 n inductively. Choose I in arbitrarily. Suppose we have already determined I ,...,I . 1 J 1 k First, assume that there is no interval I that is disjoint from the intervals I ,...,I . We show ∈J 1 k that then, k A = Ii. (2.8) i[=1 Indeed, let x A k I . Since k I is closed, there exists an δ > 0 such that ∈ \ i=1 i i=1 i S S k (x δ, x + δ) I = . − ∩ i ∅ i[=1 Since is a Vitali-cover, there exists an interval I with x I and λ(I) < δ. For this interval, J ∈J ∈ we have k k I I (x δ, x + δ) I = . ∩ i ⊂ − ∩ i ∅ i[=1 i[=1 This is a contradiction to our assumption that no such disjoint interval exists. Hence, (2.8) holds, which proves the lemma. On the other hand, assume that there exists an interval I that is disjoint from I ,...,I . Let ∈J 1 k a be the supremum over all the lengths of the intervals in that are disjoint from I ,...,I . Since k J 1 k each interval in is contained in U, it is clear that a λ(U) < . Now choose I as an interval J k ≤ ∞ k+1 in that is disjoint from I ,...,I and satisﬁes λ(I ) a /2. J 1 k k+1 ≥ k With the above procedure we get a sequence (I ) of pairwise disjoint intervals in . Since k J ∞ ∞ λ(I )= λ I λ(U) < , (2.9) k k ≤ ∞ kX=1 k[=1 the series converges and there exists an n N such that ∈ ∞ λ(Ik) < ε/5. k=Xn+1

It remains to show that λ∗(R) < ε with n R = A I . \ k k[=1 Let x R. Since n I is closed and is a Vitali-cover, there exists an interval I in small ∈ k=1 k J J enough such that x I and I is disjoint from I ,...,I . We show that I intersects some I for S∈ 1 n k k large enough. Indeed, if I is disjoint from all I , then a λ(I) for all k N. Hence, we have k k ≥ ∈ λ(I ) λ(I)/2 for all k N. This is a contradiction to (2.9). Therefore, I intersects some I . k ≥ ∈ k Let k be the smallest integer such that I intersects Ik. Then k>n and λ(I) ak 1 2λ(Ik). ≤ − ≤ Since x I and since I intersects I , the distance of x to the midpoint of I is at most ∈ k k 1 5 λ(I)+ λ(I ) λ(I ). 2 k ≤ 2 k 2 Functions of one variable 27

Therefore, if we deﬁne J to be I stretched by a factor of 5 with the same midpoint, then x J . l l ∈ k Hence, ∞ R J ⊂ k k=[n+1 and therefore ∞ ∞ λ∗(R) λ(J ) = 5 λ(I ) < ε, ≤ k k k=Xn+1 k=Xn+1 what had to be shown.

We use this lemma to prove that increasing functions are differentiable almost everywhere. To this end, we define four different derivatives of a function f at x as follows.

f(x + h) f(x) D+f(x) := lim sup − h 0 h ↓ f(x) f(x h) D−f(x) := lim sup − − h 0 h ↓ f(x + h) f(x) D+f(x) := lim inf − h 0 h ↓ f(x) f(x h) D f(x) := lim inf − − − h 0 h ↓

+ Of course, f is diﬀerentiable at x if and only if D f(x)= D−f(x)= D+f(x)= D f(x) = . − 6 ±∞ Lemma 2.4.5 (Lebesgue). Let f : [a, b] R be an increasing function. Then f is diﬀerentiable → almost everywhere, the derivative f ′ is Lebesgue-measurable, and

b f ′(x)dx f(b) f(a). ≤ − Za Proof. We ﬁrst show that the sets where two of the introduced derivatives diﬀer have outer measure zero. Let A be the set on which D+f(x) > D f(x), the other cases follow analogously. It is clear − that we can write A as the union of the sets

+ Ap,q := D f>p>q>D f − for p,q Q, so it suﬃces to show that λ (A ) = 0. ∈ ∗ p,q Let ε > 0, choose p,q Q with p>q and denote s := λ (A ). Take an open set U A with ∈ ∗ p,q ⊃ p,q λ(U)

~~n n f(x ) f(x h )~~

For every point y B we can ﬁnd an arbitrarily small interval [y,y + r] that is contained in some ∈ Ik such that f(y) f(y r) > pr. − − The collection of those intervals is a Vitali cover of B. Applying the Vitali Covering Lemma 2.4.4, we get a ﬁnite pairwise disjoint subcollection of intervals J1,...,Jm whose union covers a subset of B of outer measure at least s 2ε. Again writing J = [y ,y + r ] and summing over all intervals − l l l l gives m m f(y + r ) f(y ) >p r >p(s 2ε). l l − l l − l=1 l=1 X X Each interval Jl is contained in some interval Ik, so if we sum over all Jl contained in Ik, we get

~~f(yl + rl) f(yl) f(xk) f(xk hk), l − ≤ − − X since f is increasing. In particular, we have~~

m n f(y + r ) f(y ) f(x ) f(x h ) . l l − l ≤ k − k − k l=1 k=1 X X Hence, p(s 2ε) 0. By taking ε to zero, we get p q, which is a contradiction. ≤ We have shown that the function f(x + h) f(x) g(x) := lim − h 0 h → is well-defined almost everywhere, and f is differentiable whenever g is finite. Define

g (x) := n f(x + 1/n) f(x) , n − where we set f(x) = f(b) for x > b. Then gn converges to g almost everywhere. Since f is Lebesgue-measurable (even Borel-measurable) by Theorem 2.4.2, g is also Lebesgue-measurable. Using Fatou’s lemma, we get

b b b b g(x)dx = lim gn(x)dx lim inf gn(x)dx = lim inf n f(x + 1/n) f(x) dx n ≤ n n − Za Za →∞ →∞ Za →∞ Za b+1/n a+1/n a+1/n = lim inf n f(x)dx n f(x)dx lim inf f(b) n f(a)dx n − ≤ n − →∞ Zb Za →∞ Za = f(b) f(a). − In particular, g is integrable and thus ﬁnite almost everywhere. Hence, f is diﬀerentiable almost everywhere.

~~Theorem 2.4.6. Let f : [a, b] R be a function of bounded variation. Then f is diﬀerentiable → almost everywhere, the derivative f ′ is Lebesgue-measurable, and~~

~~b f ′(x) dx Var(f). (2.10) | | ≤ Za 2 Functions of one variable 29~~

+ + Proof. Let (f ,f −) be the Jordan decomposition of f. Lemma 2.4.5 implies that f and f − are diﬀerentiable almost everywhere, and their derivatives are Lebesgue-measurable. Therefore, the derivative of f, f = (f +) (f ) exists almost everywhere and is Lebesgue-measurable. Finally, ′ ′ − − ′ b b b b b + + f ′(x) dx (f )′(x) dx + (f −)′(x) dx = (f )′(x)dx + (f −)′(x)dx | | ≤ Za Za Za Za Za + + f ( b) f (a )+ f −(b) f −(a) = Var(f). ≤ − −

We have seen in Example 2.1.7 that absolutely continuous functions are of bounded variation. It turns out that for absolutely continuous functions, we have equality in (2.10). This follows from the fundamental theorem of calculus for Lebesgue integrals. Since we use it again later, we state and prove this theorem. First, we show some measure theoretic statements. The ﬁrst lemma was proved by Egorov in [16]. Lemma 2.4.7 (Egorov). Let M R be a measurable set with λ(M) < and let (f ) be a sequence ⊂ ∞ n of measurable functions f : M R converging to a function f : M R almost everywhere on n → → M. Then for each ε > 0 there exists a measurable set M M such that λ(M M ) < ε and (f ) ε ⊂ \ ε n converges to f uniformly on Mε.

Proof. Let ε> 0. For n, k N define ∈ 1 E := x M : f (x) f(x) . n,k ∈ | m − |≥ k m n [≥ Obviously, E E for all n N. Furthermore, if f (x) f(x) for some x M, then n+1,k ⊂ n,k ∈ n → ∈ x / E for n sufficiently large. Since (f ) converges to f almost everywhere, the set ∞ E is ∈ n,k n n=1 n,k a null-set for every k N. Since λ(M) < , we can find a number n N for every k N such ∈ ∞ k ∈ T ∈ that ε λ(E ) < . nk,k 2k Define ∞ A := Enk,k. k[=1 Clearly, λ(A) < ε. Furthermore, if k N, then for every n>n and for every x M A, ∈ k ∈ \ f (x) f(x) < 1/k, which implies that (f ) converges uniformly on M A to f. | n − | n \ Lemma 2.4.8. Let M R be a measurable set and let f : M R be integrable. Then for each ⊂ → ε> 0, there exists a δ > 0 such that for all measurable sets N M with λ(N) δ, we have ⊂ ≤ f(x) dx ε. | | ≤ ZN Proof. Since f is integrable, it is finite almost everywhere. Thus, the sequence 1 gn := f >n f {| | }| | converges to zero almost everywhere. The functions gn are dominated by the integrable function f . Hence, by the dominated convergence theorem, | |

~~lim f(x) dx = lim gn(x)dx = lim gn(x)dx = 0. n f >n | | n M M n →∞ Z{| | } →∞ Z Z →∞ 2 Functions of one variable 30~~

In particular, there exists an n N such that ∈ ε f(x) dx< . f >n | | 2 Z{| | } Let δ := ε/(2n). Then for all measurable sets N M with λ(N) δ, ⊂ ≤ ε ε f(x) dx f(x) dx + f(x) dx< + n dx + λ(N)n ε. N | | ≤ f >n | | N f >n | | 2 N f >n ≤ 2 ≤ Z Z{| | } Z \{| | } Z \{| | }

~~Lemma 2.4.9. Open subsets of R can be written as a countable union of disjoint open intervals.~~

Proof. Let O be an open subset of R. For every x O we ﬁnd an open interval in O containing ∈ x. Thus, there also exists a largest interval in O containing x (the union of all those intervals). Consider the set of those largest intervals. First, the intervals in this set are pairwise disjoint, as otherwise they would not be maximal. Second, there are at most countably many, as they are pairwise disjoint and all contain a rational number.

The proof of Lebesgue’s Theorem is taken from [5]. Theorem 2.4.10 (Lebesgue). If f : [a, b] R is absolutely continuous, then the derivative f exists → ′ almost everywhere, is integrable, and satisﬁes

b f ′(x)dx = f(b) f(a). (2.11) − Za Proof. Absolutely continuous functions are of bounded variation by Example 2.1.7. By Theorem 2.4.6, f is diﬀerentiable almost everywhere and the derivative is integrable. Furthermore, f has the + + Jordan decomposition (f ,f −), and the functions f and f − are again absolutely continuous by Theorem 2.2.5. Hence, we may assume without loss of generality that f is increasing. Similarly to the proof of Lemma 2.4.5, we deﬁne the functions

g (x) := n f(x + 1/n) f(x) . n − We again have that b a+1/n g (x)dx = f(b) n f(x)dx. n − Za Za Since f is continuous, the integral on the right-hand side is a Riemann integral. By the mean value theorem for Riemann integrals, we have

~~b lim gn(x)dx = f(b) f(a). n − →∞ Za Thus, it remains to show that~~

b b lim gn(x)dx = f ′(x)dx. n →∞ Za Za Let ε > 0. Since f is absolutely continuous, there exists a δ > 0 such that for all ﬁnite collections of pairwise disjoint intervals (a1, b1),..., (an, bn) with n b a < δ k − k k=1 X 2 Functions of one variable 31 we have n f(b ) f(a ) < ε. k − k k=1 X

~~By Lemma 2.4.8, we can choose a 0 < δ′ < δ such that~~

f(x) dx < ε | | ZN for all measurable sets N [a, b] with λ(N) δ . ⊂ ≤ ′ Let D be the set of points where f is diﬀerentiable. Then [a, b] D is a nullset. By Egorov’s theorem \ 2.4.7, we can ﬁnd a measurable set M D such that λ(M) < δ and (g ) converges uniformly to ⊂ ′ n f on D M. Hence, there exists an N N such that for all n N, ′ \ ∈ ≥

~~g (x) f ′(x) dx < ε. n − ZD M \~~

~~Therefore,~~

~~b b g (x)dx f ′(x)dx = g (x)dx f ′(x)dx g (x) f ′(x) dx n − n − ≤ n − Za Za ZD ZD ZD~~

~~= g (x) f ′(x) dx + g (x) f ′(x) dx n − n − ZD M ZM \~~

< ε + g (x) dx + f ′(x) dx< 2ε + g (x) dx. | n | | | | n | ZM ZM ZM It remains to show that the last integral tends to zero as n goes to inﬁnity. We want to approximate M by a ﬁnite set of disjoint intervals so that we can apply the absolute continuity of f. Since λ(M) δ < δ, there exists an open set O (a, b) such that M O and λ(O) < δ. By Lemma ≤ ′ ⊂ ⊂ 2.4.9, we can write O as a countable union of disjoint open intervals, say

∞ O = (ai, bi). i[=1 If the union is ﬁnite, we can already apply the absolute continuity of f. Conversely, assume that the union is inﬁnite. For all τ [0, 1] and all m N, ∈ ∈ m (bi + τ) (ai + τ) < δ, i=1 − X so that m f(bi + τ) f(ai + τ) < ε. i=1 − X Since

bi bi bi g (x) dx = n f(x + 1/n) f(x) dx = n f(x + 1/n) f(x) dx | n | − − Zai Zai Zai b i+1/n a i+1/n 1/n = n f(x)dx f(x)dx = n f(b + x) f(a + x) dx, − i − i Zbi Zai Z0 2 Functions of one variable 32 we have m ∞ bi 1/n gn(x) dx gn(x) dx = gn(x) dx = lim n f(bi + x) f(ai + x) dx | | ≤ | | | | m − ZM ZO i=1 Zai →∞ i=1 Z0 X X 1/n m 1/n = lim n f(bi + x) f(ai + x) dx lim n ε dx = ε, m − ≤ m →∞ Z0 i=1 →∞ Z0 X which implies (2.11).

~~Theorem 2.4.11. Let f : [a, b] R be absolutely continuous. Then → b f ′(x) dx = Var(f). | | Za Proof. By Example 2.1.7, f is of bounded variation. Theorem 2.4.6 implies that~~

~~b f ′(x) dx Var(f). | | ≤ Za The converse inequality follows from Theorem 2.4.10 with~~

~~y+ y+ b Var(f) = sup f(y ) f(y) = sup f ′(x)dx sup f ′(x) dx = f ′(x) dx. + − ≤ | | | | Y y Y y Zy Y y Zy Za Y∈ X∈Y Y∈ X∈Y Y∈ X∈Y~~

~~2.5 Signed Borel measures~~

There is a natural connection between functions of bounded variation and signed Borel measures. To study this connection, let us first define such measures. We denote by R the set [ , ]. −∞ ∞ Definition 2.5.1. Let (Ω, Σ) be a measurable space. A function ν :Σ R is called a signed → measure, if

~~1. ν(Σ) ( , ] or ν(Σ) [ , ), ⊂ −∞ ∞ ⊂ −∞ ∞ 2. ν( ) = 0, ∅~~

3. for every disjoint sequence (An) in Σ, n∞=1 ν(An) exists in R and P ∞ ∞ ν(An)= ν An . nX=1 n[=1 The space (Ω, Σ,ν) is called signed measure space. The measure ν is called positive if ν(Σ) [0, ] ⊂ ∞ and it is called ﬁnite, if ν(Σ) R. If Ω is a topological space and Σ= (Ω) is the Borel σ-algebra, ⊂ B i.e. the σ-algebra generated by the open sets, then ν is called a Borel-measure.

Deﬁnition 2.5.2. Let (Ω, Σ,ν) be a signed measure space. A set N Σ is called null set if for all ∈ N Σ with N N, we have ν(N ) = 0. ′ ∈ ′ ⊂ ′ Let ν1,ν2 be two signed measures on a measure space (Ω, Σ). The two measures are called orthogonal, written ν ν , if there exist C ,C Σ such that C C =Ω, C C = , C is a null set for 1 ⊥ 2 1 2 ∈ 1 ∪ 2 1 ∩ 2 ∅ 1 ν2 and C2 is a null set for ν1. 2 Functions of one variable 33

Similarly to functions in , there is also a Jordan decompositions of signed measures. A proof of BV the Jordan decomposition theorem for measures can be found in [50]. Theorem 2.5.3 (Jordan Decomposition Theorem). Let (Ω, Σ,ν) be a signed measure space. Then + there exists a unique decomposition of ν into two positive measures ν and ν−, at least one of which is finite, such that + + ν = ν ν− and ν ν−. − ⊥ For functions, we defined the Jordan decomposition in terms of the total variation. For signed measures, we reverse the procedure. We have found a Jordan decomposition and now define the total variation.

+ Deﬁnition 2.5.4. Let (Ω, Σ,ν) be a signed measure space and let (ν ,ν−) be the Jordan decomposition of ν. The positive measure ν := ν+ + ν is called the variation measure of ν. The quantity | | − Var(ν) := ν (Ω) = ν+(Ω) + ν (Ω) is called the total variation of ν. | | − We show the following theorem. Theorem 2.5.5. Let f : R be a right-continuous function of bounded variation. Then there I → exists a unique ﬁnite signed Borel measure µ on such that I f(x)= µ([a, x]), x . (2.12) ∈I Furthermore, we have Var(µ) = Var(f)+ f(a) . (2.13) | | + + If (f ,f −) is the Jordan decomposition of f and (µ ,µ−) is the Jordan decomposition of µ, then

+ + f (x)= µ ((a, x]) and f −(x)= µ−((a, x]), x . (2.14) ∈I Similarly, let µ be a ﬁnite signed Borel-measure on . Then there exists a unique right-continuous I function f of bounded variation on for which (2.12) and (2.13) hold. If we again consider the I corresponding Jordan decompositions, then (2.14) holds.

For the proof, we use the following similar correspondence between increasing functions and positive Borel measures. A proof of this lemma can be found in [14]. Lemma 2.5.6. Let f : R be an increasing right-continuous function. Then there exists a I → unique finite positive Borel measure µ on such that I f(x)= µ([a, x]), x . (2.15) ∈I Furthermore, we have Var(µ) = Var(f)+ f(a) . (2.16) | | Similarly, let µ be a finite positive Borel measure on . Then there exists a unique increasing I right-continuous function f on for which (2.15) and (2.16) hold. I Proof of Theorem 2.5.5. Let f : R be a right-continuous function of bounded variation. By I → + Theorem 2.3.2, we can find the Jordan decomposition (f ,f −) of f with two increasing right- continuous functions f +,f : R such that f +(a)= f (a)=0 and − I → − + f(x)= f (x) f −(x)+ f(a) + − Varf (x)= f (x)+ f −(x). 2 Functions of one variable 34

~~Lemma 2.5.6 gives ﬁnite positive Borel measures ν+ and ν on with − I f +(x)= ν+([a, x])~~

f −(x)= ν−([a, x]). Denote by δ the measure that satisfies δ (A) := 1 (x) for A ( ). It is easy to see that the x x A ∈ B I measure µ := ν+ ν + f(a)δ is a finite signed Borel measure on . Furthermore, − − a I + + f(x)= f (x) f −(x)+ f(a)= ν ([a, x]) ν−([a, x]) + f(a)= µ([a, x]), − − + so that we have (2.12). Let (µ ,µ−) be the Jordan decomposition of µ. Then we have Borel sets C+, C with C+ C = [a, b], C+ C = and µ+(C )= µ (C+) = 0. Define the functions − ∪ − ∩ − ∅ − − g+(x) := µ+((a, x])

g−(x) := µ−((a, x]). + + + We want to show that g = f and g− = f −. Since (f ,f −) is the Jordan decomposition of f, + and the Jordan decomposition is unique by Theorem 2.3.2, it is sufficient to show that (g , g−) is a + + Jordan decomposition of f. Obviously, g and g− are increasing functions with g (a)= g−(a) = 0. Furthermore, + + f(x)= f(a)+ f (x) f −(x)= f(a)+ ν ((a, x]) ν−((a, x]) = µ([a, x]) = µ(a)+ µ((a, x]) + − +− = f(a)+ µ ((a, x]) µ−((a, x]) = f(a)+ g (x) g−(x). − − + It remains to show that Varf (x)= g (x)+ g−(x). Notice that + + + + + + + + g (x)= µ ((a, x]) = µ (C (a, x]) = µ(C (a, x]) = ν (C (a, x]) ν−(C (a, x]) ∩ ∩ ∩ − ∩ ν+(C+ (a, x]) ν+((a, x]) = f +(x). ≤ ∩ ≤ Similarly, g (x) f (x). Thus, Var (x)= f +(x)+f (x) g+(x)+g (x). However, by Proposition − ≤ − f − ≥ − 2.3.1, + + + Var (x) = Var(f; a, x) = Var(f(a)+ g g−; a, x) Var(g ; a, x) + Var(g−; a, x)= g (x)+ g−(x). f − ≤ + + + Hence, Varf (x) = g (x)+ g−(x) and we can conclude that (g , g−) = (f ,f −) is the Jordan decomposition of f. In particular, this also proves (2.14). Finally, + + Var(µ)= µ ([a, b]) + µ−([a, b]) = f(a) + f (b)+ f −(b) = Var(f)+ f(a) , | | | | proving assertion (2.13). Conversely, assume that µ is a finite signed Borel-measure on and let (µ+,µ ) be the Jordan I − decomposition of µ. Define the functions f +(x)= µ+((a, x])

f −(x)= µ−((a, x]) + f(x)= f (x) f −(x)+ µ( a ). − { } Then f obviously satisfies (2.12). Furthermore, since µ is a Borel measure, f is right-continuous. The uniqueness of f is also clear. Since f can be written as the difference of two monotone functions, f is of bounded variation by Theorem 2.3.2. From the first implication of the theorem, which we + have already shown, it follows that (f ,f −) is the Jordan decomposition of f. Moreover, + + + Var(µ)= µ ([a, b]) + µ−([a, b]) = µ ((a, b]) + µ−((a, b]) + µ( a ) = f (b)+ f −(b)+ f(a) | { } | | | = Var(f)+ f(a) , | | proving the theorem. 2 Functions of one variable 35

~~2.6 Dimension of the graph~~

~~We study the Hausdorff and the box dimension of the graph of a function of bounded variation. If f is a function defined on a set A, then the graph of f is defined as~~

graph(f) := (x,f(x)): x A . { ∈ } Throughout this chapter, we mostly follow [17]. To motivate the definition of fractal dimension, we consider the Cantor set C. The Cantor set is inductively defined as follows. Let C = [0, 1]. Then C = [0, 1/3] [2/3, 1], i.e. we remove the 0 1 ∪ middle third of C . Next, C = [0, 1/9] [2/9, 1/3] [2/3, 7/9] [8/9, 1], i.e. we again remove the 0 2 ∪ ∪ ∪ middle thirds of the intervals of C1. This process of removing the middle third can be continued inductively. We call the limit C = n∞=0 Cn the Cantor set. It is easy to see that C has Lebesgue- measure 0, so one could argue that C is not really 1-dimensional. On the other hand, C contains T uncountably many points, so one could equally well argue that C is not 0-dimensional. Thus, we introduce fractal dimensions, i.e. non-natural dimensions. Two of the most common fractal dimensions are the Hausdorff and the box dimension. First, we define the Hausdorff dimension. For a nonempty set U Rn, we define the diameter of U ⊂ by diam(U) := sup x y : x,y U , {k − k2 ∈ } where . denotes the Euclidean norm. Let F Rn. A δ-cover of F is a countable collection U k k2 ⊂ { i} of subsets of Rn such that diam(U ) δ for all i N and i ≤ ∈ ∞ F U . ⊂ i i[=1 For s [0, ) and δ > 0 we define ∈ ∞ ∞ Hs(F ) := inf diam(U )s : U is a δ-cover of F . δ i { i} n Xi=1 o Now we define the s-dimensional Hausdorff measure of F by

s s H (F ) := lim Hδ (F ). δ 0 ↓ In the case s = 0, we have diam(U)s = 1 for all U Rn. Hence, it is easy to see that the 0- ⊂ dimensional Hausdorff measure is exactly the counting measure, i.e. H0(F ) is the number of points in F . It can be proved that the s-dimensional Hausdorff measure is an outer measure on (Rn) and a P measure on (Rn). Furthermore, we have the following proposition. B Proposition 2.6.1. Let q>s 0 and A Rn. Then ≥ ⊂ Hs(A) < = Hq(A) = 0, ∞ ⇒ Hq(A) > 0 = Hs(A)= . ⇒ ∞ Proof. The second statement is equivalent to the first, so we show only the first. Let δ > 0 and let U be a δ-cover of A with { i} ∞ Hs(A) + 1 diam(U )s. ≥ i Xi=1 2 Functions of one variable 36

~~Then,~~

q ∞ q ∞ s q s ∞ s q s q s s H (A) diam(Ui) = diam(Ui) diam(Ui) − diam(Ui) δ − δ − H (A) + 1 . ≤ i=1 i=1 ≤ i=1 ≤ X X X Taking δ 0 yields Hq(A) = 0. → This proposition motivates the following definition. Definition 2.6.2. Let F Rn and s 0. The Hausdorff dimension of F is defined by ⊂ ≥ dim (F ) := inf s : Hs(F )= = sup s : Hs(F ) = 0 . H { ∞} { } The n-dimensional Hausdorff measure coincides up to a constant with the n-dimensional Lebesgue measure, i.e. if F Rn is Lebesgue measurable, then Hn(F ) = C λn(F ), where λn is the n- ⊂ n dimensional Lebesgue measure and Cn is a constant independent of F . In particular, open subsets of Rn have Hausdorff dimension n. Furthermore, the Hausdorff dimension of the Cantor set C is log(2)/ log(3) 0.6309. ≈ For a “nice” function, one would expect the graph of the function to have Hausdorff dimension 1, since it is a line (or curve) in R2. However, there are even continuous functions whose graph has larger Hausdorff dimension. As an example, paths of a 1-dimensional Wiener process are continuous and have Hausdorff dimension 3/2 with probability 1. However, we have also mentioned in Example 2.1.10 that paths of the Wiener process are generally not of bounded variation. Indeed, functions of bounded variation are much more well-behaved. Theorem 2.6.3. The graph of a function f : [a, b] R that is of bounded total variation has → Hausdorff dimension 1.

We first need the following result, which is a well-known fact about the Hausdorff dimension and Lipschitz continuous functions. Lemma 2.6.4. Let Ω Rm and let f :Ω Rn be Lipschitz continuous with Lipschitz constant 1, ⊂ → i.e. for all x,y Ω, we have ∈ f(x) f(y) x y . k − k2 ≤ k − k2 Then dim (f(Ω)) dim (Ω). H ≤ H Proof. Let U be a δ-cover of Ω. Since { i} diam f(Ω U ) diam Ω U diam(U ), ∩ i ≤ ∩ i ≤ i f(Ω U ) is a δ-cover of f(Ω). Thus, ∩ i ∞ ∞ diam f(Ω Ui) diam(Ui), i=1 ∩ ≤ i=1 X X and taking the infimum over all δ-covers and letting δ 0 gives the desired result. → We can use this lemma to prove the following statement for arbitrary functions. Lemma 2.6.5. Let f :Ω R be a function with Ω Rm. Then → ⊂ dim (graph(f)) dim (Ω). H ≥ H 2 Functions of one variable 37

~~Proof. Deﬁne the projection P : graph(f) Ω with P ((x,f(x)) = x. This is a Lipschitz map with → Lipschitz constant 1. Applying Lemma 2.6.4, we get~~

~~dim (Ω) = dim (P (graph(f))) dim (graph(f)). H H ≤ H~~

Proof of Theorem 2.6.3. Lemma 2.6.5 already implies that dim (graph(f)) 1, so it remains to H ≥ show that dim (graph(f)) 1. H ≤ Let δ > 0. By Theorem 2.4.2, f only has removable discontinuities and discontinuities of jump type. Clearly, there can be at most Var(f; a, b)/δ points at which f “jumps” by δ or more. Denote those points by x <

k 1 19(b a) − − + 21 n δ i Xi=0 circles of diameter δ. Since n Var(f; x ,x )/δ + 1, we need at most i ≤ i i+1 19(b a) 21 Var(f; a, b) 19(b a) 21 Var(f; a, b) Var(f; a, b) − + + 21k − + + 21 δ δ ≤ δ δ δ 19(b a) + 42 Var(f; a, b) = − δ circles of diameter δ to cover the whole graph. Thus,

(19(b a)+42 Var(f;a,b))/δ ⌈ − ⌉ H1(graph(f)) δ 19(b a) + 42 Var(f; a, b). δ ≤ ≤ − Xi=1 Therefore, dim (graph(f)) 1. H ≤ Another definition of fractal dimension which is widely used is the box dimension. Some reason for its popularity is its relative ease of mathematical calculation and empirical estimation. There are many equivalent ways for defining the box dimension, see for example [17]. We use the most practical definition for functions of bounded variation. First, we need the concept of a δ-mesh.

~~Deﬁnition 2.6.6. For δ > 0, the δ-mesh of Rn is the collection of the cubes~~

~~[m1δ, (m1 + 1)δ] [mnδ, (mn + 1)δ], ×···× where the mi are integers. 2 Functions of one variable 38~~

Deﬁnition 2.6.7. Let F Rn be bounded and non-empty and let δ > 0. Denote by N (F ) number ⊂ δ of cubes of the δ-mesh of Rn that have a common point with F . Then the lower and upper box dimension of F , respectively, are deﬁned as

~~log Nδ(F ) dimB(F ) := lim inf δ 0 log δ → − and log Nδ(F ) dimB(F ) := lim sup . δ 0 log δ → − If dimB(F )= dimB(F ), we deﬁne the box dimension of F as~~

log Nδ(F ) dimB(F ) := lim . δ 0 log δ → − Hence, the box dimension of F is the order by which the number of boxes needed to cover F increases as the boxes get smaller. Similarly to the Hausdorff dimension, the box dimension of the Cantor set is log(2)/ log(3). However, the Hausdorff dimension and the box dimension can differ vastly. For example, the Hausdorff dimension of the rational numbers Q is 0 (since Q is countable), while the box dimension is 1 (since Q is dense). We have the following well-known relation between the Hausdorff and the box dimension.

~~Proposition 2.6.8. Let F Rn be bounded. Then, the inequalities ⊂ dim (F ) dim (F ) dim (F ) H ≤ B ≤ B hold.~~

s s/2 Proof. The inequality dimB(F ) dimB(F ) is trivial. Let s 0 be such that H (F ) > n and ≤ ≥ n assume that F can be covered by Nδ(F ) boxes of the δ-mesh of R . Each of those boxes has a diameter of √nδ. Hence, the collection of those boxes is an √nδ-cover of F . In particular,

Nδ(F ) ∞ Hs (F ) = inf diam(U )s : U is a √nδ-cover of F (√nδ)s = N (F )ns/2δs. √nδ i { i} ≤ δ Xi=1 Xi=1 Taking the limit on the left side and the limes inferior on the right side as δ tends to zero, we get

~~s/2 s s/2 s n 1, and therefore,~~

~~log Nδ(F )+ s log δ > 0.~~

Hence, log Nδ(F ) s lim inf = dimB(F ). ≤ δ 0 log δ → − Furthermore, Hs(F ) >ns/2 > 0 implies that dim (F ) s, proving the proposition. H ≤ Finally, we can state the result for the box dimension of a graph of a function of bounded variation. We note that it was proved in [38] that the box dimension of the graph of continuous functions of bounded variation is 1. We improve on this result by getting rid of the assumption that the function needs to be continuous. 2 Functions of one variable 39

~~Theorem 2.6.9. Let f : [0, 1] R be a function of bounded variation. Then the box dimension of → the graph of f is 1.~~

~~First, we need a deﬁnition and two additional lemmas.~~

~~Deﬁnition 2.6.10. We deﬁne the oscillation of a function f :Ω R on Ω as →~~

~~oscΩ(f) := osc(f;Ω):= sup f(x) f(y) = sup f(x) inf f(y). x,y Ω | − | x Ω − y Ω ∈ ∈ ∈~~

If Ω = [a, b] is an interval, we also write osc(f; a, b) := oscΩ(f). Lemma 2.6.11. Let f : [0, 1] R be a function. Suppose that 0 <δ< 1 and let m be the smallest →1 integer greater or equal to δ− . Let Nδ be the number of squares of the δ-mesh that intersect graph(f). Then m 1 1 − m N 2m + δ− osc(f; iδ, (i + 1)δ). ≤ δ ≤ Xi=0 Remark 2.6.12. We note that a similar lemma was proved in [17], although it required the function to be continuous and gave a diﬀerent lower bound for Nδ.

Proof. It is easily seen that the number of mesh squares of size δ in the column above the interval [iδ, (i + 1)δ] that intersect graph(f) is at most 2 + osc(f; iδ, (i + 1)δ)/δ. Summing over all intervals gives the upper bound. The lower bound holds, since the graph of f intersects the m mesh squares [iδ, (i + 1)δ] [j δ, (j + 1)δ] with i = 0,...,m 1 and j Z such that j δ f(iδ) (j + 1)δ. × i i − i ∈ i ≤ ≤ i Lemma 2.6.13. Let f : [0, 1] R be a function. Suppose that 0 <δ< 1 and let m be the smallest 1→ integer greater or equal to δ− . Then

m 1 − osc f; iδ, (i + 1)δ Var(f). i=0 ≤ X Proof. First, note that osc f; iδ, (i+1)δ = sup f(x) f(y) sup f(y+) f(y) = Var f; iδ, (i+1)δ . x,y [iδ,(i+1)δ] − ≤ Y[iδ,(i+1)δ] y − ∈ Y∈ X∈Y

~~Using Proposition 2.2.2, we have~~

~~m 1 m 1 − − osc f; iδ, (i + 1)δ Var f; iδ, (i + 1)δ = Var f; 0, mδ = Var(f; 0, 1). i=0 ≤ i=0 X X~~

Proof of Theorem 2.6.9. Let 0 < δ 1/2 and let m be the smallest integer greater or equal to δ 1. ≤ − Then we have 2m 2(δ 1 + 1) 2δ 1 + 2 3δ 1. Combining Lemma 2.6.11 and Lemma 2.6.13 ≤ − ≤ − ≤ − yields m 1 1 − 1 1 N 2m + δ− osc(f; iδ, (i + 1)δ) 3δ− + δ− Var(f). δ ≤ ≤ Xi=0 We conclude that 1 log Nδ log((Var(f) + 3)δ− ) dimB(graph(f)) = lim sup lim sup = 1. δ 0 log δ ≤ δ 0 log δ → − → − 2 Functions of one variable 40

~~Furthermore, Lemma 2.6.11 implies that~~

~~1 log Nδ log m log δ− dimB(graph(f)) = lim inf lim inf lim inf = 1. δ 0 log δ ≥ δ 0 log δ ≥ δ 0 log δ → − → − → − Therefore, dimB(graph(f)) = 1.~~

2.7 Structure of BV We already know that is a vector space. It might be tempting to try to prove that Var is a BV norm, thus showing that ( , Var) is a normed space, possibly even a Banach space. However, Var BV is not positive deﬁnite; constant functions always have variation 0. In fact, we have the following equivalence. Lemma 2.7.1. A function has zero variation if and only if it is constant.

~~Proof. If f is constant, say f(x)= c for all x, then~~

Var(f) = sup f(y+) f(y) = sup c c = 0. Y y | − | Y y | − | Y∈ X∈Y Y∈ X∈Y Conversely, assume that f is not constant and let x (a, b) be such that f(x) = f(a). Take the ∈ 6 ladder := a, x . Then, Y0 { } Var(f) = sup f(y ) f(y) f(y ) f(y) = f(x) f(a) + f(b) f(x) > 0. | + − |≥ | + − | | − | | − | Y y y 0 Y∈ X∈Y X∈Y The case x = b follows similarly.

To get a norm, we deﬁne f := f(a) + Var(f). k kBV | | Equipped with this norm, is a Banach space. BV Theorem 2.7.2. The space together with the norm . is a Banach space. BV k kBV We split the proof into multiple steps. First, we prove that is a normed space. BV Lemma 2.7.3. The space together with the norm . is a normed space. BV k kBV Proof. Proposition 2.3.1 immediately gives us the triangle inequality and the homogeneity of . . k kBV Finally, if f = 0, then Var(f) = 0 and by Lemma 2.7.1, f is constant. Since f = 0 also k kBV k kBV implies that f(a) = 0, f = 0. Thus, . is a norm on . k kBV BV Lemma 2.7.4. The space is a subspace of , the Banach space of bounded functions equipped BV B with the supremum norm f := supx f(x) . Furthermore, for all f we have f f . k k∞ | | ∈ BV k k∞ ≤ k kBV In particular, convergence in implies uniform convergence, i.e. convergence in . BV B Proof. Let f : [a, b] R be of bounded variation, let x [a, b] and deﬁne the ladder := a, x b . → ∈ Y { }\{ } Then

f(x) f(a) + f(b) f(x) + f(x) f(a) f(a) + f(y+) f(y) f(a) +Var(f)= f . BV | | ≤ | | | − | | − | ≤ | | y | − | ≤ | | k k X∈Y Taking the supremum over all x [a, b], we have f f . This implies the lemma. ∈ k k∞ ≤ k kBV 2 Functions of one variable 41

Lemma 2.7.5. The functional Var is lower semi-continuous; if (fn) is a sequence of functions in that converges pointwise to f, then Var(f) lim infn Var(fn). BV ≤ →∞ Proof. Since f f pointwise, we have n →

Var(f)= sup f(y+) f(y) = sup lim fn(y+) fn(y) n Y y | − | Y y →∞ | − | Y∈ X∈Y Y∈ X∈Y = sup lim fn(y+) fn(y) lim inf sup fn(y+) fn(y) n n Y →∞ y | − |≤ →∞ Y y | − | Y∈ X∈Y Y∈ X∈Y = lim inf Var(fn). n →∞

~~We are now able to prove that is complete. Instead of proving it directly for , we prove a BV BV more general statement that will be useful for higher-dimensional functions.~~

Lemma 2.7.6. Let (X, . ) be a Banach space and let U X be a closed subspace. Let . be k kX ⊂ k kU a norm on U and assume that for all f U we have f f . Moreover, if a sequence (f ) ∈ k kX ≤ k kU n in U converges to a function f U with respect to . X , assume that f U lim infn fn U . ∈ k k k k ≤ →∞ k k Then (U, . ) is complete. k kU Proof. Let (f ) be a Cauchy sequence in (U, . ). Since U X and f f f f , n k kU ⊂ k n − mkX ≤ k n − mkU (f ) is also a Cauchy sequence in (U, . ). Since (X, . ) is complete and U is a closed subspace, n k kX k kX (U, . ) is complete and (f ) converges to a function f U with respect to . . It remains to k kX n ∈ k kX show that (f ) converges to f with respect to . . To this end, let ε> 0 and let N be suﬃciently n k kU large such that for all n,m N we have f f ε. Let n N. Clearly, (f f ) is a ≥ k n − mkU ≤ ≥ m − n m sequence in U that converges to f f with respect to . . Since U is closed, f f U and − n k kX − n ∈

~~f fn U lim inf fm fn U ε. m k − k ≤ →∞ k − k ≤~~

~~Since ε was arbitrary, (fn) converges to f in U.~~

Proof of Theorem 2.7.2. By Lemma 2.7.3, it remains to show that is complete. The complete- BV ness follows immediately from Lemma 2.7.6 with (X, . X ) = ( , . ) and (U, . U ) = ( , . ), k k B k k∞ k k BV k kBV together with Lemma 2.7.4 and Lemma 2.7.5.

~~Next, we prove that ( , . ) is a commutative Banach algebra with pointwise multiplication. BV k kBV Let us ﬁrst deﬁne the concept of a Banach algebra.~~

Deﬁnition 2.7.7. A Banach space X is called a Banach algebra, if we have a (not necessarily commutative) multiplication deﬁned on X such that for all x,y,z X and all scalars λ we have ∈ • x (y + z)= x y + x z and (x + y) z = x z + y z, · · · · · · • λ(x y) = (λx) y = x (λy), and · · · • x y x y . k · k ≤ k kk k The Banach algebra is called commutative if the multiplication is commutative.

~~The following theorem was ﬁrst proved by Kuller in [36]. 2 Functions of one variable 42~~

~~Theorem 2.7.8. The Banach space ( , . ) is a commutative Banach algebra with respect to BV k kBV pointwise multiplication.~~

Proof. It remains to show that fg f g . Let f, g . By the Jordan Decomposition k kBV ≤ k kBV k+ kBV + ∈ BV Theorem 2.3.2, there are increasing functions f ,f −, g , g−, such that + f = f f − + f(a) + − g = g g− + g(a) + − f = f (b)+ f −(b)+ f(a) BV k k + | | g = g (b)+ g−(b)+ g(a) . k kBV | | By the triangle inequality for Var, we get fg = f(a)g(a) + Var(fg) BV k k | | + + + + + f(a)g(a) + Var(f g ) + Var(f g−) + Var(f g(a)) + Var(f −g ) + Var(f −g−) ≤ | | + + Var(f −g(a)) + Var(f(a)g ) + Var(f(a)g−) + Var(f(a)g(a)) + + + + + = f(a)g(a) + f (b)g (b)+ f (b)g−(b)+ g(a) f (b)+ f −(b)g (b)+ f −(b)g−(b) | | + | | + g(a) f −(b)+ f(a) g (b)+ f(a) g−(b) | | | | | | + + = f (b)+ f −(b)+ f(a) g (b)+ g−(b)+ g(a) | | | | = f g , k kBV k kBV which proves the theorem.

Since is a Banach algebra, it is of course closed under multiplication. We show that it is also BV closed under division, given that the denominator is bounded away from zero. Proposition 2.7.9. Let f, g and let δ > 0 be such that g(x) δ for all x . Then ∈ BV ≥ ∈ I f/g . ∈ BV Proof. Since we already know that is closed under multiplication, it is suﬃcient to show that BV 1/g . Given a ladder on , we have ∈ BV Y I 1 1 g(y+) g(y) 2 2 = − δ− g(y ) g(y) δ− Var(g). g(y ) − g(y) g(y)g(y+) ≤ + − ≤ y + y y X∈Y X∈Y X∈Y

Taking the supremum over all ladders yields Var(1 /g) δ 2 Var(g) < , proving that 1/g ≤ − ∞ ∈ . BV Theorem 2.7.10. The space ( , . ) is not separable. BV k kBV Proof. For c [a, b], we deﬁne the function f := 1 and the ball ∈ c [c,b] Bc := f : f fc 1 . { ∈ BV k − kBV ≤ } Then, for c, d [a, b] with c = d, it is easy to see that ∈ 6 fc fd = 2, k − kBV which implies that B B = . c ∩ d ∅ Let D be dense. Then for all c [a, b], there must be a point of D in B . Since the balls ⊂ BV ∈ c Bc are pairwise disjoint and since their index set [a, b] is uncountable, D must also be uncountable. Thus, is not separable. BV 2 Functions of one variable 43

~~While is not separable, we get some weaker version of compactness. This is called Helly’s First BV Theorem, and was ﬁrst proved by Helly in [25].~~

Theorem 2.7.11 (Helly’s First Theorem). Let (f ) be a uniformly bounded sequence in , i.e. n BV fn K for all n. Then there exists a subsequence of (fn) that converges pointwise to a function k kBV ≤ f with f K. ∈ BV k kBV ≤ For the proof of this theorem, we use Helly’s Selection Principle, which was also proved by Helly in [25].

Theorem 2.7.12 (Helly’s Selection Principle). Let X be a set, let f : X R be a uniformly n → bounded sequence of functions and let D X be countable. Then there exists a subsequence of (f ) ⊂ n that converges pointwise on D.

Proof. We prove this statement with the classical diagonalization technique. Let (xk) be an enu- 1 k meration of D. Let (fn) be a subsequence of (fn) that converges at x1. Then, inductively, let (fn ) k 1 be a subsequence of (fn− ) that converges at xk. All those subsequences exist since the sequence n (fn) is uniformly bounded. Deﬁne the sequence (gn) := (fn ), which is a subsequence of (fn). It is easy to see that (gn(xk))n converges for all k. This proves the theorem.

~~The following lemma tells us that an increasing function deﬁned on a bounded set can be extended to an increasing function on an encompassing interval.~~

~~Lemma 2.7.13. Let D [a, b] and let b = sup D. If the function f : D R is increasing, then it ⊂ → can be extended to an increasing function on the whole interval [a, b].~~

~~Proof. For x [a, b], deﬁne g(x) := sup f(t) : a t x,t D . Clearly, g is an increasing ∈ { ≤ ≤ ∈ } extension of f.~~

~~We ﬁrst prove Helly’s First Theorem for increasing functions.~~

Lemma 2.7.14. Let (fn) be a sequence of increasing functions on the interval [a, b] with fn K k k∞ ≤ for all n N. Then some subsequence of (f ) converges pointwise to an increasing function f with ∈ n f K. k k∞ ≤ Proof. Deﬁne the countable set D := ([a, b] Q) a, b . By Helly’s Selection Principle 2.7.12, we ∩ ∪ { } get a subsequence (fnk ) of (fn) that converges pointwise on D. Deﬁne the function

~~φ(x) := lim fn (x), x D. k k ∈ →∞ It is easy to see that φ is increasing. Using Lemma 2.7.13, we can extend φ to an increasing function on [a, b]. We again call this extension φ.~~

~~Let x [a, b] be such that φ is continuous at x. We show that φ(x) = limk fnk (x). Let ε > 0 ∈ →∞ and choose rationals p,q [a, b] with p~~

~~φ(x) ε φ(p) ε/2 f (p) f (x) f (q) φ(q)+ ε/2 φ(x)+ ε. − ≤ − ≤ nk ≤ nk ≤ nk ≤ ≤~~

~~Thus, φ(x) = limk fnk (x). →∞ 2 Functions of one variable 44~~

Since φ is increasing, the set D′ of discontinuities of φ is at most countable by Lemma 2.4.1. We apply Helly’s Selection Principle on (f ) and D to get a sequence (f ) that converges on the nk ′ nkj entire interval [a, b]. Then, it is easy to see that the limit function

~~f(x) := lim fn (x) j kj →∞ is increasing.~~

Proof of Theorem 2.7.11. Let (fn) be a bounded sequence in with fn K for all n. By BV k kBV ≤ Lemma 2.7.4, (fn) is bounded in and in particular fn K. By the Jordan Decomposition B k k∞ ≤ Theorem 2.3.2, we can write each fn as the diﬀerence of the increasing functions 1 f +(x)= (Var (x)+ f (x) f (a)) and n 2 fn n − n 1 f −(x)= (Var (x) f (x)+ f (a)). n 2 fn − n n + Both sequences (fn ) and (fn−) are uniformly bounded, since

+ 1 1 fn (Var(fn)+ fn + fn(a) )= ( fn + fn ) fn K. k k∞ ≤ 2 k k∞ | | 2 k kBV k k∞ ≤ k kBV ≤ + The same chain of inequalities holds for (fn−). Thus, Lemma 2.7.14 gives a subsequence (fnk ) of + + (fn ) that converges pointwise to an increasing function f . Analogously, Lemma 2.7.14 gives a subsequence (f − ) of (f − ) that converges pointwise to an increasing function f −. nkj nk + Since fn = f f −, the sequence (fn ) converges pointwise to a function f. Finally, by Lemma n − n kj 2.7.5, f is also of bounded variation and

~~f = f(a) + Var(f) lim fn (a) + lim inf Var(fn ) = lim inf fn K. BV j kj j kj j kj BV k k | | ≤ →∞ | | →∞ →∞ k k ≤~~

Remark 2.7.15. Theorem 2.7.11 is called Helly’s First Theorem for a reason; there is of course also Helly’s Second Theorem. It deals with Riemann-Stieltjes integrals and states that given a uniformly bounded (with respect to the -norm) sequence of functions (α ) in that converges pointwise BV n BV to a function α , we have for all continuous functions f that ∈ BV b b f dα f dα. n → Za Za

2.8 Ideal structure of BV We have already shown in Theorem 2.7.8 that is a commutative Banach algebra with respect to BV pointwise multiplication. Furthermore, contains a unit element with respect to multiplication, BV which is the constant function x 1. The aim of this Chapter is to characterize the maximal ideals 7→ of . BV Deﬁnition 2.8.1. Let A be a Banach algebra with unit e. An element x A is called invertible, ∈ if there exists a y A such that xy = yx = e. The inverse y is unique if it exists and we write 1 ∈ y = x− . Deﬁnition 2.8.2. Let A be a Banach algebra. A subset A is called an ideal (of A) if is J ⊂ J closed under addition and closed under multiplication by elements of A, i.e. if 2 Functions of one variable 45

• for all x,y , also x + y and ∈J ∈J • for all x and y A, also xy,yx A. ∈J ∈ ∈ An ideal is called proper if = A. J J 6 A proper ideal is called maximal if it is not strictly included in another proper ideal. J We prove some simple statements about ideals.

Lemma 2.8.3. Let A be a Banach algebra with unit e and let be an ideal. If there exists an J invertible element in , then = A. In particular, is proper if and only if e / . J J J ∈J Proof. Let x be invertible with inverse y A. Since is an ideal, also xy = e . Let z A ∈J ∈ J ∈J ∈ be arbitrary. Since e , also ze = z , proving that = A. The second statement follows ∈ J ∈ J J immediately.

Lemma 2.8.4. Let A be a Banach algebra with unit e and let γ γ Γ be a chain of ideals in A, {J } ∈ i.e. for all γ , γ Γ, we have or . Then 1 2 ∈ Jγ1 ⊂Jγ2 Jγ2 ⊂Jγ1 := J Jγ γ Γ [∈ is an ideal. Furthermore, if all the ideals are proper, also is proper. Jγ J Proof. Let x,y . Then x and y for some γ , γ Γ. Since the ideals form ∈ J ∈ Jγ1 ∈ Jγ2 1 2 ∈ Jγ a chain, we may assume without loss of generality that x,y . Since is an ideal, also ∈ Jγ1 Jγ1 x + y , and thus x + y . ∈Jγ1 ∈J Next, let x and y A. Then x for some γ Γ. Since is an ideal, also xy,yx , ∈ J ∈ ∈ Jγ ∈ Jγ ∈ Jγ and therefore xy,yx . Hence, we have shown that is an ideal. ∈J J Assume now that all the ideals are proper. Then e / for all γ Γ, and thus also e / . By Jγ ∈Jγ ∈ ∈J Lemma 2.8.3, is a proper ideal. J Proposition 2.8.5. Let A be a Banach algebra and let be a proper ideal. Then is included in J J a maximal ideal.

Proof. This is an application of Zorn’s lemma. Let be the collection of all proper ideals containing K . Clearly, is not empty, since , and is a partially ordered set with respect to inclusion. J K J ∈ K K Let be a chain in and deﬁne C K := . UC U U∈C[ By Lemma 2.8.4, is again a proper ideal containing . Hence, and is obviously an UC J UC ∈ K UC upper bound of the chain . We have thus proved that every chain of ideals in has an upper C K bound in . Applying Zorn’s lemma, we get that contains a maximal element that we call . By K K V the deﬁnition of , is a proper ideal containing . Furthermore, is a maximal ideal. Indeed, K V J V if there were a proper ideal strictly containing , then this ideal would also be in , contradicting V K the maximality of in . V K We have the following characterization of proper ideals of . BV 2 Functions of one variable 46

Proposition 2.8.6. Let be an ideal of . Then is a proper ideal if and only if there exists J BV J a point x such that for all neighbourhoods U of x and all functions f , 0 ∈I 0 ∈J inf f(x) = 0. x U | | ∈ Proof. First, assume that there exists a point x such that for all neighbourhoods U of x and 0 ∈I 0 all functions f , ∈J inf f(x) = 0. x | | ∈U Obviously, the constant function x 1 is not in , so is a proper ideal. 7→ J J Conversely, assume that for all x there exists a neighbourhood U , a function f and a ∈ I x x ∈ J positive number δx such that inf fx(z) δx. z Ux | |≥ ∈ Since is compact and the collection Ux x is a cover of , there exist ﬁnitely many points I { } ∈I I x ,...,x such that U ,...,U is already a cover of . We deﬁne the function 1 n ∈I { x1 xn } I n 2 f := fxi , Xi=1 and observe that f(x) min δ2 > 0 for all x . Furthermore, since is an ideal, f is ≥ i=1,...,n xi ∈ I J in . By Proposition 2.7.9, f is invertible. Therefore, Lemma 2.8.3 shows that is not a proper J J ideal.

We denote by (or more precisely ) the set of maximal ideals of a Banach algebra A, also M MA called the maximal ideal space of A. We want to fully characterize the elements in . MBV Since a maximal ideal is a proper ideal, we already know by Proposition 2.8.6 that there must be a point around which all functions in the ideal decay to zero, at least on some sequence converging to that point. The easiest case is that we have the ideal

= f : f(x ) = 0 J ∈ BV 0 for some x . It is immediately clear that is a proper ideal. It is also easy to show that 0 ∈ I J J is maximal. Indeed, if is a strictly larger ideal, then contains a function f with f(x0) = 0. K K 1 6 Since f , f is bounded by Lemma 2.7.4. Furthermore, the function g = 2 f (1 x0 ) is ∈ BV k k∞ − { } clearly in and thus in , implying that f + g . However, f + g is bounded away from zero J K ∈ K everywhere by f(x ) > 0. Therefore, cannot be a proper ideal by Proposition 2.8.6, implying | 0 | K that is maximal. J However, there are more maximal ideals. The characterization of proper ideals in Proposition 2.8.6 only tells us that for all functions f in the ideal, there exists a sequence (xn) converging to x0 such that limn f(xn) = 0. A priori, the sequence can depend on the function. We show, however, →∞ that this is not really the case. By Theorem 2.4.2, the left- and right-side limits of functions of bounded variation exist everywhere. Keeping this in mind, we show that

= f : f(x ) = 0 J ∈ BV 0− is a maximal ideal. The same holds if we replace f(x ) by f( x +), of course. 0− 0 It is clear that the set is closed under addition. The fact that it is closed under multiplication J by elements in follows immediately from the fact that functions in are bounded by Lemma BV BV 2.7.4. Therefore, is an ideal. By Proposition 2.8.6, is a proper ideal. It remains to show J J 2 Functions of one variable 47 that is maximal. Let be an ideal strictly containing . Then contains a function f with J K J J f(x ) = 0. Assume without loss of generality that f(x ) = 1. Due to the deﬁnition of the 0− 6 0− left-side limit, there exists an ε > 0 such that f(x) > 1/2 for x (x0 ε, x0). Furthermore, by 1 ∈ − Lemma 2.7.4, f is bounded. The function g = 2 f 1 (x0 ε,x0) is clearly in . Therefore, k k∞ − − J f + g . However, f + g is bounded away from zero everywhere by 1/2. Therefore, cannot be ∈ K K a proper ideal by Proposition 2.8.6, implying that is maximal. J The following theorem asserts that we have found all the maximal ideals in . BV Theorem 2.8.7. The maximal ideal space of [a, b] can be identiﬁed with the disjoint union M BV of the three intervals (a, b], [a, b] and [a, b) as follows.

~~1. Every x in the ﬁrst interval (a, b] corresponds to the maximal ideal~~

~~(x, 1) := f : f(x ) = 0 . J ∈ BV − 2. Every x in the second interval [a, b] corresponds to the maximal ideal~~

~~(x, 2) := f : f(x) = 0 . J ∈ BV 3. Every x in the third interval [a, b) corresponds to the maximal ideal~~

(x, 3) := f : f(x+)=0 . J ∈ BV Proof. It is clear that the statement of the theorem describes exactly the maximal ideals we have found already. It remains to show that those are all the maximal ideals of . Let be a maximal BV J ideal. By Proposition 2.8.6, there exists a point x [0, 1] such that for all neighbourhoods U of x 0 ∈ 0 and all functions f , ∈J inf f(x) = 0. x U ∈

Of course, since is maximal, this point x0 is unique. Since the left- and right-side limits of J functions in exist by Theorem 2.4.2, we have for all functions f that J ∈J f(x )=0 or f(x )=0 or f(x +)=0. (2.17) 0− 0 0 Assume that contains three functions f ,f ,f such that J 1 2 3 f (x ) =0 and f (x ) =0 and f (x +) = 0. 1 0− 6 2 0 6 3 0 6 Then F = f 2 + f 2 + f 2 and 1 2 3 ∈J F (x ) =0 and F (x ) =0 and F (x +) = 0, 0− 6 0 6 0 6 a contradiction to (2.17). Therefore, at least one of the conditions (2.17) must be satisﬁed by all functions in . Then, however, is a subset of one of the maximal ideals we have already J J found.

~~2.9 Fourier Series~~

The study of Fourier series started in the 1740s, when Bernoulli, D’Alembert, Lagrange and Euler were led by problems in mathematical physics to debate the possibility of writing a 2π-periodic function as a series of trigonometric functions of the form

~~a ∞ 0 + a cos(kx)+ b sin(kx) . (2.18) 2 k k kX=1 2 Functions of one variable 48~~

In 1807, Fourier conjectured in [19] that every real-valued function can be represented as a Fourier series with the Fourier coefficients 1 π 1 π ak = f(ξ) cos(kξ)dξ and bk = f(ξ) sin(kξ)dξ. (2.19) π π π π Z− Z− Of course, this conjecture is false, since there are functions that are not even integrable. Dirichlet was the first to study Fourier series rigorously in 1829, where he proved in [15] that every piecewise monotone real function on an interval has a pointwise convergent Fourier series. As already mentioned in the beginning of this chapter, Jordan originally introduced functions of bounded variation for the study of Fourier series. In his paper [31] he also proved that every function of bounded variation can be decomposed into two monotone functions and hence greatly extended Dirichlet’s result to functions of bounded variation. The goal of this section is to explore this connection and derive convergence results for the Fourier series of functions of bounded variation. Throughout this section, we only consider functions that are defined on the interval [ π,π]. Our − treatment of Fourier series is mostly based on the book by Kufner and Kadlec ([34]), although we have adapted their results using the theory of functions of bounded variation that we have already developed. First, we define the Fourier series of a function.

~~Definition 2.9.1. For an integrable function f : [ π,π] R, we define the Fourier series S(f) of − → f by (2.18), where the coefficients ak and bk are given by (2.19).~~

~~The choice of the Fourier coeﬃcients ak and bk is by no means arbitrary. In fact, we have the following well-known theorem, which can be found in any book on Fourier series.~~

Theorem 2.9.2. The functions x π 1/2 sin(kx) and x π 1/2 cos(kx) for k N together with 7→ − 7→ − ∈ the constant function x (2π) 1/2 are an orthonormal basis of L2[ π,π]. 7→ − − It is thus apparent that the representation of a function as its Fourier series is just the expansion of that function in the above mentioned orthonormal basis. Therefore, we immediately get the following theorem.

Theorem 2.9.3. If f L2[ π,π], then S(f) converges to f in the L2-norm. ∈ − In particular, since functions of bounded variation are square integrable (this follows immediately from the fact that they are measurable and bounded), we know that the Fourier series of functions of bounded variation converges in L2 to the function itself. On the other hand, we cannot expect the Fourier series to converge uniformly; since the partial sums of the Fourier series are continuous functions, their uniform limit is also continuous. However, there are discontinuous functions of bounded variation. We cannot even expect the Fourier series to converge pointwise to our original function. This follows from the fact that the Fourier series cannot distinguish between the zero- 1 function and the function a , where a [ π,π], both of which are functions of bounded variation. { } ∈ − Nevertheless, we will be able to characterize the limit of the Fourier series of a function of bounded variation completely, see Theorem 2.9.14. Before we consider the convergence of the Fourier series of functions of bounded variation, we need some basic facts about Fourier series. The series (2.18) is also called the real Fourier series, in contrast to the complex Fourier series. To introduce the complex Fourier series, we remind the reader of Euler’s formulas, which are

~~eix + e ix eix e ix eix = cos x + i sin x, cos x = − and sin x = − − . (2.20) 2 2i 2 Functions of one variable 49~~

~~These formulas enable us to write~~

ikx ikx ikx ikx a ∞ a ∞ e + e e e 0 + a cos(kx)+ b sin(kx) = 0 + a − + b − − (2.21) 2 k k 2 k 2 k 2i kX=1 kX=1 a0 ∞ ak ibk ikx ak + ibk ikx = + − e + e− 2 2 2 kX=1 ∞ ikx = cke , k= X−∞ for an appropriate choice of c C, where the limit of the last series should be interpreted as the k ∈ limit of the partial sums n ikx sn(x) := cke . (2.22) k= n X− The expression ∞ ikx cke (2.23) k= X−∞ is also called the complex Fourier series of f. Using Euler’s formula (2.20), we can deduce a formula for c , given the formulas (2.19) for a and b . Indeed, for k N it is apparent from (2.21) and the k k k ∈ deﬁnition (2.19) of ak and bk that

π π ak ibk 1 i ck = − = f(ξ) cos(kξ)dξ f(ξ) sin(kξ)dξ 2 2π π − 2π π π Z− Z− π 1 1 ikξ = f(ξ) cos(kξ) i sin(kξ) dξ = f(ξ)e− dξ. 2π π − 2π π Z− Z− We can prove a similar statement for non-positive k. Therefore, every square-integrable function f can be represented by its complex Fourier series (2.23) with the coeﬃcients

~~π 1 ikξ ck = f(ξ)e− dξ. (2.24) 2π π Z−~~

~~Next, we want to deduce an alternative formula for the partial sums sn using the so-called Dirichlet kernel.~~

~~Deﬁnition 2.9.4. For n N, the Dirichlet kernel D : R R is deﬁned as ∈ n → 1 sin(n + 1/2)t D (t)= . n 2π sin t/2~~

~~We extend the Dirichlet kernel continuously to the points where the denominator is zero.~~

Lemma 2.9.5. The partial sum s of a function f : [ π,π] R can be written as n − → π sn(x)= f(x + t)Dn(t)dt. π Z− Remark 2.9.6. Notice in the preceding lemma that the function f is integrated over a set where it is not necessarily deﬁned. From now on, we extend functions deﬁned on [ π,π] periodically (with − period 2π) to R. While it is not required in the preceding lemma that f( π)= f(π), this is also not − necessary, since the integral is independent of single function values. 2 Functions of one variable 50

~~Proof. Using the deﬁnition (2.22) of sn together with the formula (2.24) for ck, we get~~

~~n π π n 1 ikξ ikx 1 ik(x ξ) s (x)= f(ξ)e− dξ e = f(ξ) e − dξ. n 2π 2π k= n Z π Z π k= n X− − − X−~~

The substitution t = ξ x together with the 2π-periodicity of f and the functions t eikt yields − 7→ π n π n 1 ik(x ξ) 1 ikt f(ξ) e − dξ = f(x + t) e− dt. 2π 2π Z π k= n Z π k= n − X− − X− It remains to show that n ikt sin (n + 1/2)t e− = . (2.25) sin(t/2) k= n X− This follows from the formula for the ﬁnite geometric sum and Euler’s formula (2.20) with

~~n 2n i(2n+1)t i(n+1/2)t i(n+1/2)t ikt int ikt int 1 e− e e− sin (n + 1/2)t e− = e e− = e − = − = . 1 e it it/2 it/2 sin(t/2) k= n k=0 − e e− X− X − −~~

~~We state another representation of the partial sums sn in terms of the Dirichlet kernel, which will be useful later on.~~

~~Lemma 2.9.7. Let f : [ π,π] be an integrable function, let s denote the partial sums of the − n Fourier series of f and let C be a constant. Then~~

π sn(x) C = f(x + t)+ f(x t) 2C Dn(t)dt. − 0 − − Z Proof. If g(x) = 1 for all x [ π,π], and t denotes the partial sums of the Fourier series of g, ∈ − n then also t (x) = 1 for all x [ π,π] and n N. Consequently, by Lemma 2.9.5, n ∈ − ∈ π π 1= tn(x)= g(x + t)Dn(t)dt = Dn(t)dt. π π Z− Z−

Furthermore, since sin is an odd function, Dn is an even function. Hence, π π sn(x) C = f(x + t)Dn(t)dt C = f(x + t) C Dn(t)dt − π − π − Z−π Z− = f(x + t)+ f(x t) 2C Dn(t)dt. 0 − − Z

The Riemann-Lebesgue Lemma shows that, for any integrable function, the Fourier coefficients ak and bk (and thus ck) tend to zero as k goes to infinity. We prove a weaker version of the Riemann-Lebesgue Lemma that is sufficient for our further considerations.

~~Lemma 2.9.8. For f L2[a, b], we have ∈ b lim f(x)eiλx dx = 0. λ a | |→∞ Z 2 Functions of one variable 51~~

~~Proof. Let ε > 0. Since the trigonometric functions (correctly rescaled) also form an orthonormal 2 basis of L [a, b], we ﬁnd a trigonometric polynomial, i.e. a function Tm with~~

~~m 2π T (x)= c eikcx and c = , m k b a k= m X− − such that ε f Tm 2 . k − kL ≤ √b a − By Hölder’s inequality,~~

~~b 2 2 f(x) Tm(x) dx = (f Tm) 1 L1 f Tm L 1 L < ε. a − − · ≤ k − k k k Z~~

~~Therefore,~~

~~b b b b f(x)eiλx dx T (x)eiλx dx = f(x) T (x) eiλx dx f(x) T (x) dx < ε. − m − m ≤ − m Za Za Za Za ~~

~~If we can show that b iλx Tm(x)e dx < ε Za for λ suﬃciently large, then | | b b b b f(x)eiλx dx f(x)eiλx dx T (x)eiλx dx + T (x)eiλx dx < 2ε ≤ − m m Za Za Za Za~~

~~for λ suﬃciently large, which proves the theorem. Hence, it remai ns to show that | | b iλx lim Tm(x)e dx = 0. λ a | |→∞ Z Performing the integration gives~~

b b m m b iλx ikcx iλx i(kc+λ)x Tm(x)e dx = cke e dx = ck e dx Za Za k= m k= m Za X− X− m c = k ei(kc+λ)b ei(kc+λ)a . i(kc + λ) − k= m X− Finally, a repeated application of the triangle inequality yields

b m 2 c m 2 c T (x)eiλx dx | k| | k| , m ≤ kc + λ ≤ λ c k Za k= m k= m X− | | X− | |− | | which clearly tends to zero as λ goes to infinity. This proves the lemma. | | Remark 2.9.9. With some additional work, this theorem can also be proved under the weaker assumption that f L1[a, b], see for example [34, Theorem 3.18]. ∈ We want to study the convergence of Fourier series of functions of bounded variation. We first focus on the Fourier coefficients of increasing functions, and afterwards generalize the results to functions of bounded variation using Jordan’s Decomposition Theorem 2.3.2. 2 Functions of one variable 52

~~Lemma 2.9.10. Let f : [ π,π] be an increasing function with f( π) = 0 and let n 1. Then − − ≥ 4f(π) a + b . | n| | n|≤ nπ~~

~~Proof. Substituting nx = y in the formula for bn yields~~

n 1 1 nπ 1 − (k+1)π b = f(y/n)sin y dy = ( 1)k f(y/n) sin y dy. n nπ nπ − | | Z nπ k= n Zkπ − X− Substituting t = y kπ in the respective integrals and using that sin t 0 for t [0,π] yields − ≥ ∈ n 1 1 π − t + kπ b = ( 1)kf sin t dt. n nπ − n Z0 k= n X− Therefore, n 1 1 π − t + kπ b ( 1)kf sin t dt. (2.26) | n|≤ nπ − n Z0 k= n X−

Notice that we sum over an even number of function values in the sum of inequality (2.26). If n is odd, then this sum is positive since f is increasing. In particular, we can upper bound this sum using the inequalities t + kπ (k + 1)π f f n n ≤ for even k and t + kπ kπ f f n n ≥ for odd k. Doing so, we get a telescoping sum and only f(π) f( π)= f(π) remains. Hence, − − 1 π 1 2f(π) b f(π)sin t dt 2f(π)= . | n|≤ nπ ≤ nπ nπ Z0

~~We can proceed similarly for even n and for the Fourier coeﬃcients an. Thus, we have proved the lemma.~~

~~Theorem 2.9.11. Let f be a function of bounded variation and n 1. Then ≥ 4 Var(f) a + b . | n| | n|≤ nπ~~

~~Proof. Let + f = f f − + f( π) − − be the Jordan decomposition of f (see Theorem 2.3.2). Then~~

π π π π 1 1 + 1 1 an = f(x) cos nx dx = f (x) cos nx dx f −(x) cos nx dx+ f( π) cos nx dx. π π π π −π π π π − Z− Z− Z− Z− A similar statement holds for the coeﬃcients b . Since n 1, n ≥ 1 π 1 π f( π) cos nx dx = f( π)sin nx dx = 0. π π − π π − Z− Z− 2 Functions of one variable 53

~~Using that f + and f are increasing functions with f +( π)= f ( π) = 0, we apply Lemma 2.9.10 − − − − to the remaining integrals and get~~

~~+ 4 f (π)+ f −(π) 4 Var(f) an + bn = , | | | |≤ nπ nπ proving the theorem.~~

Theorem 2.9.11 tells us that the decay of the Fourier coeﬃcients of functions of bounded variation 1 is at least of order n− . This decay is (almost) the best we could hope for, as a decay of order (1+ε) n− for ε> 0 already implies that the Fourier series converges absolutely and uniformly, which does not hold in general for functions of bounded variation. Next, we want to characterize the limit of the Fourier series of functions of bounded variation. First, we need some preparatory lemmas. Lemma 2.9.12. Let [a, b] [ π,π] and let f : [a, b] R be non-negative and increasing. Then ⊂ − → for all p> 0, b pt 8f(b) f(t)sin dt . 2 ≤ pπ Za

Proof. We ﬁrst prove this statement for natural numbers p = n. Deﬁne the function g : [ π,π] R − → as f(t) t [a, b] g(t)= ∈ (0 t / [a, b]. ∈ Since f is increasing, g is of bounded variation. In fact,

~~Var(g) f(a) + f(b) f(a) + f(b) = 2f(b). ≤ | | | − | | | Therefore, Theorem 2.9.11 implies b π 4 Var(g) 8f(b) f(t)sin nt dt = g(t)sin nt dt . ≤ nπ ≤ nπ Za Z π −~~

Now, let p > 0 and let n be a natural number larger than p. Clearly, [ap/n, bp/n] [ π,π]. ⊂ − Substituting x = pt/n and applying the current lemma to the natural number n, we get b n bp/n nx n 8f(b) 8f(b) f(t)sin pt dt = f sin nt dt = . p p ≤ p nπ pπ Za Zap/n

~~Lemma 2.9.13. Let f : (0,π] R be a non-negative increasing function with f(0+) = 0. Then → π lim f(t)Dn(t)dt = 0. n →∞ Z0 Proof. Let ε> 0 and let 0 <δ<π. Then~~

π δ π f(t)Dn(t)dt = f(t)Dn(t)dt + f(t)Dn(t)dt. Z0 Z0 Zδ 1 The function t f(t) sin(t/2) − is bounded on the interval [δ, π], and thus in L2[δ, π]. By Lemma 7→ 2.9.8, π 1 lim f(t) sin(t/2) − eiλt dt = 0. λ →∞ Zδ 2 Functions of one variable 54

~~Using this for the sequence (n + 1/2)n and using Euler’s formula (2.20), we have for all n > N(ε, δ), that π π 1 1 f(t)D (t)dt = f(t) sin(t/2) − sin (n + 1/2)t dt < ε/2. n 2π Zδ Zδ~~

~~Next, we bound the integral over [0, δ ]. Since the function t t 7→ 2π sin(t/2) is positive, increasing and bounded on the interval (0,π), the function~~

~~tf(t) g(t)= 2π sin(t/2) is also positive and increasing, and bounded on the interval (0, δ) by~~

~~δf(δ) πf(δ) f(δ) g(δ)= = . (2.27) 2π sin(δ/2) ≤ 2π sin(π/2) 2~~

~~Let p = n + 1/2 > 2π/δ. Then,~~

δ δ 1 sin(pt) δ sin(pt) f(t)D (t)dt = f(t) dt = g(t) dt n 2π sin(t/2) t Z0 Z0 Z0 2π/p sin(pt) δ p δ 1 p = g(t) dt + g(t) sin(pt)dt + g(t) sin(pt)dt. 0 t 2π/p 2π 2π/p t − 2π Z Z Z With (2.27), we can bound the ﬁrst integral by

~~2π/p sin(pt) 2π/p sin(pt) 2π/p f(δ) 2π f(δ) g(t) dt g(t) p dt p dt = p = πf(δ). t ≤ pt ≤ 2 p 2 Z0 Z0 Z0~~

~~Using (2.27) and Lemma 2.9.12, we can bound the second integral by~~

~~δ p p δ p 8f(δ)/2 2f(δ) g(t) sin(pt)dt = g(t) sin(pt)dt = . 2π 2π ≤ 2π pπ π2 Z2π/p Z2π/p~~

~~Similarly, we can bound the third integral by~~

δ 1 p p 1 δ pδ 2π 8f(δ)/2 g(t) sin(pt)dt = g(t) sin(pt)dt − t − 2π 2π − δ ≤ 2πδ pπ Z2π/p Z2π/p 2f(δ) 4 f(δ) 2f(δ) = . π2 − δpπ ≤ π2 Altogether, we have shown that

~~δ 4 f(t)D (t)dt π + f(δ) < 4f(δ). n ≤ π2 Z0 ~~

~~Since f(0+) = 0, we can ﬁnd a δ0 > 0 such that 4f(δ0) < ε/2. Therefore, for n > N(ε, δ0), we obtain π f(t)Dn(t)dt < ε, Z0~~

~~which concludes the proof. 2 Functions of one variable 55~~

We can finally prove the pointwise convergence of the Fourier series of functions of bounded variation. Recall that Theorem 2.4.2 implies that the one-sided limits of a function of bounded variation always exist. This enables us to state the following theorem. Theorem 2.9.14. Let f be a 2π-periodic function that is of bounded variation on the interval [ π,π]. Then the partial sums s of the Fourier series of f converge at every point x and we have − n 1 lim sn(x)= f(x+) f(x ) . (2.28) n 2 − − →∞ Proof. Since f is periodic and of bounded variation on [ π,π], it is of bounded variation on every − compact interval. Hence, it is sufficient to show the theorem for x [ π,π]. Let x [ π,π] and ∈ − ∈ − assume for now that f is an increasing function on [x,x + π]. Lemma 2.9.7 implies 1 π sn(x) f(x+) + f(x ) = f(x + t) f(x t) f(x+) f(x ) Dn(t)dt − 2 − 0 − − − − − Z π π = f(x + t) f(x+) Dn(t)dt + f(x t) f(x ) Dn(t)dt. 0 − 0 − − − Z Z Using Lemma 2.9.13 on both integrals yields (2.28). In particular, we have proved the theorem for increasing functions. Now, let f be a function as in the statement of the theorem. Since f is of bounded variation in the interval [ π, 2π], we can find two increasing functions f ,f on [ π, 2π] with f = f f by the − 1 2 − 1 − 2 Jordan Decomposition Theorem 2.3.2. For the functions f1,f2, we have already shown (2.28) on [ π,π]. Therefore, (2.28) also holds for the function f, which proves the theorem. − Theorem 2.9.14 implies that the Fourier series of a continuous function of bounded variation converges pointwise to the function itself. This result can be strengthened. Indeed, the Fourier series of a continuous function of bounded variation even converges uniformly. This statement is also due to Jordan, who proved it using a method called “second law of the mean”. We follow a different and perhaps simpler path that is due to Horowitz in [28]. For the proof of uniform convergence, we need the definition and some simple properties of the Riemann-Stieltjes integral. We only state them here and note their resemblance to the usual Rie- mann integral. The interested reader may find more details in many books on real analysis, for example in [45]. Definition 2.9.15. Let f : [a, b] R be a function and let α : [a, b] R be an increasing function. → → We define the lower and upper Riemann-Stieltjes integrals by b f(t)dα(t) := sup inf f(x) α(y+) α(y) and x [y,y+] − Z a Y y ∈ Y∈ X∈Y b f(t)dα(t) := inf sup f(x) α(y+) α(y) , Y − Z a Y∈ y x [y,y+] X∈Y ∈ respectively. If the two integrals coincide, we define the Riemann-Stieltjes integral of f with respect to α as b b b f(t)dα(t) := f(t)dα(t)= f(t)dα(t). Za Z a Z a + If α is a function of bounded variation and (α , α−) is its Jordan decomposition, we define the Riemann-Stieltjes integral of f with respect to α as b b b + f(t)dα(t) := f(t)dα (t) f(t)dα−(t), − Za Za Za 2 Functions of one variable 56 if it exists.

Proposition 2.9.16. Let f : [a, b] R be continuous and let α : [a, b] R be of bounded variation. → → Then the Riemann-Stieltjes integral of f with respect to α exists. Moreover, the integral is linear both in f and α. Furthermore, for c [a, b], ∈ b c b b f(t)dα(t)= f(t)dα(t)+ f(t)dα(t) and f(t)dα(t) f Var(α) ≤ k k∞ Za Za Zc Za

~~hold. If α is diﬀerentiable, then~~

~~b b f(t)dα(t)= f(t)α′(t)dt. Za Za Finally, we have a partial integration formula: If f is also of bounded variation, then~~

b b f(t)dα(t)= f(b)α(b) f(a)α(a) α(t)df(t). − − Za Za We already know that the Fourier series of continuous functions of bounded variation converges pointwise. What we need is a theorem that enables us to deduce a posteriori that the convergence was uniform. The theorem of our choice is the Arzelà-Ascoli Theorem.

Deﬁnition 2.9.17. A sequence of functions f : [a, b] R is called equicontinuous if for every n → ε> 0 and x [a, b], there exists a δ > 0 such that ∈ f (x) f (y) < ε n − n for all y [a, b] with x y < δ and for all n N. ∈ | − | ∈ Theorem 2.9.18 (Arzelà-Ascoli). Let (fn) be a sequence of real-valued functions on [a, b]. If the sequence is uniformly bounded, i.e. sup fn < , n N k k∞ ∞ ∈ and equicontinuous, then there exists a uniformly convergent subsequence.

Proof. Let ε> 0 and for every x [a, b] choose δ > 0 such that ∈ x f (x) f (y) < ε n − n for all y [a, b] with x y < δ and for all n N. The balls B(x, δ ) with center x and radius δ cover ∈ | − | ∈ x x the interval [a, b]. Since [a, b] is compact, we can ﬁnd a ﬁnite subcover B(x1, δ1),...,B(xm, δm).

Since the sequence (fn) is uniformly bounded, fn(xi) n is bounded for i = 1,...,m. Hence, there exists a subsequence (fnk ) such that fnk (xi) kconverges for all i = 1,...,m. Let x [a, b] and let i 1,...,m be such that x B(x , δ ). Then, ∈ ∈ { } ∈ i i f (x) f (x) f (x) f (x ) + f (x ) f (x ) + f (x ) f (x) 3ε nk − nj ≤ nk − nk i nk i − nj i nj i − nj ≤ for k, j N large enough. ∈

~~We have shown that for every ε> 0 there exist a subsequence (fnk ) of (fn) such that~~

~~fn fnj < ε k k − k∞ holds for all k, j N. ∈ 2 Functions of one variable 57~~

~~Let (f1,n)n be a subsequence of (fn) with~~

f1,n f1,m < 1 k − k∞ for all n,m N. Inductively, let (fk,n)n be a subsequence of (fk 1,n)n with ∈ − 1 fk,n fk,m < k − k∞ k for all n,m N. Then the sequence (f ) converges uniformly. ∈ k,k k Lemma 2.9.19. Let X be a metric space and let (xn) be a sequence in X. Then (xn) converges to x X if and only if every subsequence of (x ) has a subsequence that converges to x. ∈ n

~~Proof. If (xn) converges to x, then every subsequence converges to x as well.~~

Conversely, assume that every subsequence of (xn) has a subsequence that converges to x and suppose that (xn) does not converge to x. Then there exists an ε > 0 and a subsequence (xnk ) of (x ) such that for all k N, d(x ,x) > ε. By assumption, there exists a subsequence of (x ) that n ∈ nk nk converges to x, which is a contradiction.

Corollary 2.9.20. Let (fn) be a sequence of real-valued functions on [a, b] that converges pointwise to a function f. If the sequence (fn) is uniformly bounded and equicontinuous, the convergence is uniform.

Proof. Since (fn) is uniformly bounded and equicontinuous, so is every subsequence (fnk ) of (fn). Applying the Arzelà-Ascoli Theorem 2.9.18 to (f ), we get a subsequence (f ) that converges nk nkj uniformly to a function. Since (f ) already converges pointwise to f, it thus also converges nkj uniformly to f. Hence, every subsequence of (fn) has a subsequence that converges uniformly to f. By Lemma 2.9.19, (fn) converges uniformly to f. Lemma 2.9.21. Let n sin kx tn(x) := and t(x) := lim tn(x). n k →∞ kX=1 Then t and t are 2π-periodic, (t ) is uniformly bounded, and t (x) (π x)/2 uniformly in every n n n → − interval [δ, 2π δ] with 0 <δ<π. −

Proof. It is clear that tn is 2π-periodic. If the limit t exists, it is clear that it is 2π-periodic as well. Thus, we only consider x [0, 2π). Using (2.25), ∈ x n x sin (n + 1/2)t sin(t/2) tn(x)= cos(kt) dt = − dt 0 0 2 sin(t/2) Z kX=1 Z x sin (n + 1/2)t x 1 1 x = dt + sin (n + 1/2)t dt t 2 sin(t/2) − t − 2 Z0 Z0 (n+1/2)x sin u x 1 1 x = du + sin (n + 1/2)t dt . u 2 sin(t/2) − t − 2 Z0 Z0 It is easily veriﬁed that h sin u du u Z0 2 Functions of one variable 58 is non-negative for all h 0 and has its maximum at h = π. Hence, ≥ π sin u x 1 1 x t (x) du + sin (n + 1/2)t dt + | n |≤ u 2 sin(t/2) − t 2 Z0 Z0 π sin u π 1 1 π du + dt + , ≤ u 2 sin(t/2) − t 2 Z0 Z0 for 0 x π. Since t is a sum of odd functions, the same bound holds for π x 0, and by ≤ ≤ n − ≤ ≤ periodicity, for all x R. A simple calculation shows that the above uniform bound on t (x) is ∈ | n | indeed ﬁnite. Next, we calculate t(x) for 0

(n+1/2)x sin u sin u lim du = ∞ du n u u →∞ Z0 Z0 exists as an improper Riemann integral (but not as a Lebesgue integral) and is ﬁnite. We denote the value of this integral by I. Also, using partial integration, we have

x 1 1 1 1 sin (n + 1/2)x sin (n + 1/2)t dt = 2 sin(t/2) − t − 2 sin(x/2) − x n + 1/2 Z0 1 1 sin (n + 1/2)t + lim t 0 2 sin(t/2) − t n + 1/2 → 1 x d 1 1 + cos (n + 1/2)t dt. n + 1/2 dt 2 sin(t/2) − t Z0 Observe that the first summand tends to zero as n goes to infinity, the second summand is zero, and the third summand also tends to zero as n goes to infinity since the integrand is uniformly bounded. Hence, x t(x)= I for 0

y 1 y 1 y x dt + dt + | − | = C y x ≤ x δ x 2 sin(π δ/2) 2 | − | Z Z − for some constant C independent of x,y and n. Hence, (tn) is equicontinuous and converges uniformly to t on [δ, 2π δ]. − Lemma 2.9.22. Let x π T (x) := − + t (x). n 2 n Then T (x) = πD (x), (T ) is uniformly bounded on [0, 2π) and T (x) 0 uniformly in every n′ n n n → interval [δ, 2π δ] with 0 <δ<π. − 2 Functions of one variable 59

~~Proof. Using (2.25) and that sin is odd and cos is even, we have~~

~~n n 1 1 1 ikx 1 sin (n + 1/2)x T ′ (x)= + cos kx = + e = = πD (x). n 2 2 2 − 2 2 sin(x/2) n k=1 k= n X X− The remaining statements follow immediately from Lemma 2.9.21.~~

~~Theorem 2.9.23. The Fourier series of a continuous 2π-periodic function of bounded variation converges uniformly to the original function.~~

~~Proof. Let sn denote the n-th partial sum of the Fourier series of a continuous 2π-periodic function f of bounded variation and let 0 <δ<π. By Lemma 2.9.5 and Lemma 2.9.7,~~

2π s (x) f(x)= f(x + t) f(x) D (t)dt. n − − n Z0 Let φ (t) := f(x + t) f(x). Then φ (0) = φ (2π) = 0. Using integration by parts, x − x x 2π 2π 2π s (x) f(x)= φ (t)D (t)dt = φ (t)dD (t)= T (t)dφ (t) n − x n x n − n x Z0 Z0 Z0 δ 2π δ 2π − = Tn(t)dφx(t) Tn(t)dφx(t) Tn(t)dφx(t). − 0 − δ − 2π δ Z Z Z −

Let ε > 0. Since f is continuous and of bounded variation, also φx is continuous and of bounded variation. Hence, Var (t) := Var(φ ; 0,t) and Var (t) := Var(φ ; 2π t, 2π) are continuous by 1 x 2 x − Theorem 2.2.5. Thus, there exists a 0 <δ<π such that Var1(δ) < ε and Var2(δ) < ε. Now, let (T ) be uniformly bounded by C, and let N N be such that for all n > N, n ∈

~~sup Tn(t) < ε. t [δ,2π δ] | | ∈ − Then,~~

sn(x) f(x) sup Tn(t) Var(φx; 0, δ)+ sup Tn(t) Var(φx, δ, 2π δ) − ≤ t [0,δ] | | t [δ,2π δ] | | − ∈ ∈ − + sup Tn(t) Var(φx, 2π δ, 2π) t [2π δ,2π] | | − ∈ − Cε + ε2 Var(f; 0, 2π)+ Cε. ≤

~~In particular, sn converges uniformly to f. 3 Functions of multiple variables 60~~

~~3 Functions of multiple variables~~

~~3.1 Deﬁnitions~~

In one dimension, the definition of bounded variation was clear and straight-forward. In higher dimensions, there are many possible generalizations, each preserving different properties of the univariate functions of bounded variation. One approach is to define ladders in higher dimensions and then generalize the difference f(y ) f(y) . However, there are multiple ways of generalizing + − | this difference. Equally well, one could make the monotone decomposition the defining property of functions of bounded variation, i.e. define that the set of functions of bounded variation is the vector space induced by the monotone functions. However, what does it mean to be increasing in higher dimensions? Finally, one could say that the defining property of functions of bounded variation is that they do not oscillate to much, and therefore consider oscillations of the function, similarly to Chapter 2.6 on the dimension of the graph. We see that there are many different ways of generalizing bounded variation to higher dimensions, and all of them yield different results. We study the variations in the sense of Vitali, Hardy and Krause, Arzelà, Hahn and Pierpont, as they exhibit many useful properties. First, we need to introduce some general notation. Vectors in Rd are written in bold font, i.e. we write a = (a1,...,ad) Rd. In particular, we use the notation 0 = (0,..., 0) and 1 = (1,..., 1). ∈ Given two vectors a and b, we write a b if ai bi for all i = 1,...,d. Furthermore, we write ≤ ≤ [a, b] := x : a x b . The expressions a < b, (a, b), [a, b) and (a, b] are defined similarly. { ≤ ≤ } For n N we define [n] := 1,...,n . For u, v [d], we write u for the cardinality of u and u v ∈ { } ⊂ | | − for the complement of v with respect to u. We define the unary minus of u as the complement of u in [d], i.e. u := [d] u. The unary minus has higher precedence than the binary minus, and . − − ∪ ∩ For u [d] and x [a, b] we denote by xu the u -tuple of the components xi for i u. If u, v [d] ⊂ ∈ | | ∈ ⊂ are disjoint and if x, y [a, b], then we denote by xu : yv the u v -tuple z [au v, bu v] with ∈ | ∪ | ∈ ∪ ∪ zi = xi for i u and zj = yj for j v. We can also use this gluing symbol for more than two ∈ ∈ components, as long as the subsets of [d] are mutually disjoint. Furthermore, for x [a, b] and ∈ i [d], we also write x i instead of x i . ∈ − −{ } u u u u u For u [d], x− [a− , b− ] and a function f : [a, b] R, we can define a function g : [a , b ] R ⊂u u∈ u u u → → by g(x )= f(x : x− ). We write f(x ; x− ) to denote such a function with the argument on the left of the semi-colon and the parameters on the right. For a rectangle = [a, b], we define the d-fold alternating sum of f over as I I [d] [d] v v v ∆ (f; ):=∆ (f; a, b) := ( 1)| |f(a : b− ). I − v [d] X⊂ This operator ∆[d] is one of the generalizations of the difference f(y ) f(y) in one dimension. + − More generally, we define for all u [d] the operator ⊂ u u v v u v u ∆ (f; ):=∆ (f; a, b) := ( 1)| |f(a : b − : a− ). I v u − X⊂ For i [d], we also write ∆i =∆ i and ∆ i =∆ i . Finally, we define the operator ∆ by ∈ { } − −{ } ∆(f; ) := ∆(f; a, b) := f(b) f(a). I − This is another generalization of the difference f(y ) f(y). + − We may also apply the operators ∆u and ∆ to a function f at a point x . In this case, the ∈ I interpretation is ∆uf(x) := ∆u(f; x, x + h) and ∆f(x) := ∆(f; x, x + h), respectively, for some 3 Functions of multiple variables 61 increment h 0 which is not further specified. As we mostly use this notation in statements like ≥ ∆f(x) 0, to say that the function f is increasing, this lack of specification causes no problems. ≥ Next, we define multidimensional ladders. For j [d] let j be a ladder on [aj , bj]. Then we define d j ∈ 1 Y d the d-dimensional ladder := j=1 . For y = (y ,...y ) , we define the successor y+ by 1 d Y Y 1 d ∈ Y (y+,...,y+) and the predecessor y by (y ,...,y ). The successor b+ of b is again b and the Q − predecessor b of b is defined as (b1 ,...,b−d ). We− denote by Y = Y( ) = Y(a, b) the set of all − − − I ladders. Given a ladder Y( ) and j [d], we denote by j the one-dimensional ladder in the j-th Y ∈ I ∈ Y d j coordinate of . In particular, we always write = j=1 . Given a set u [d], we write u i Y u Y Y ⊂ = i u . For u = , = . Y ∈ Y ∅ Y ∅ Q Finally,Q a ladder Y( ) naturally splits the rectangle into many smaller subrectangles, which Y ∈ I I we call cells. We define ( ) to be the collection of those cells. Cells are always assumed to be R Y closed. We are now able to define the Vitali-variation.

~~Deﬁnition 3.1.1. The Vitali-variation of a function f : R on is deﬁned as I → I [d] VarV (f; ) := VarV (f; a, b) := sup ∆ (f; y, y+) . I Y y Y∈ X∈Y~~

If the rectangle is clear from the context, we also write Var (f) := Var (f; ). The function I V V I f is of bounded Vitali-variation, if Var (f) < . We denote the set of all functions of bounded V ∞ Vitali-variation on by . I V The Hardy-Krause-variation is the sum of the Vitali-variations over the entire rectangle and all lower-dimensional faces adjacent to either a or b.

~~Deﬁnition 3.1.2. The Hardy-Krause-variation at 1 of a function f : R on is deﬁned as I → I u u u u Var 1(f; ) := Var 1(f; a, b) := Var (f(.− ; b ); a− , b− ). HK I HK V uX([d]~~

If the rectangle is clear from the context, we also write Var 1(f) := Var 1(f; ). The function I HK HK I f is of bounded Hardy-Krause-variation, if Var 1(f) < . We denote the set of all functions of HK ∞ bounded Hardy-Krause-variation on by . I HK Thus, a function is of bounded Hardy-Krause-variation, if the Vitali-variation on the entire rectangle as well as all its faces adjacent to b is finite. The reason for the suffix “at 1” is that one often I works on the rectangle = [0, 1]d, in which case b = 1. Of course, we can analogously define the I Hardy-Krause-variation at 0.

~~Deﬁnition 3.1.3. The Hardy-Krause-variation at 0 of a function f : R is deﬁned as I → u u u u Var 0(f; ) := Var 0(f; a, b) := Var (f(.− ; a ); a− , b− ). HK I HK V uX([d]~~

If the rectangle is clear from the context, we also write Var 0(f) := Var 0(f; ). I HK HK I It turns out that the two deﬁnitions of the Hardy-Krause-variation are equivalent, as can be seen from the theorem below. 3 Functions of multiple variables 62

Theorem 3.1.4. Let f : [0, 1]d R be a function with Var 0(f) < . Then → HK ∞ d Var 1(f) (2 1) Var 0(f). HK ≤ − HK Analogously, if Var 1(f) < , then HK ∞ d Var 0(f) (2 1) Var 1(f). HK ≤ − HK The proof of this theorem is non-trivial and relies on results that we discuss later in this thesis. The interested reader can find a proof in [2] due to Aistleitner and Dick. Next, we define the Arzelà-variation. Unfortunately, we cannot use ladders for the definition. Instead, we need diagonals, which are special sequences. Before introducing them, we make some remarks about the notation. If (x ) is a (finite or infinite) sequence, we write z (x ) if z = x n ∈ n n for some n N. Furthermore, if the sequence (x ) is finite, we denote by #(x ) the length of ∈ n n the sequence. If x = (x ,...,x ) is a finite sequence, we define x z := (x ,...,x , z). Finally, if 1 n ∪ 1 n x = (x ,...,x ) and z = (z ,...,z ) are finite sequences, we define x z := (x ,...,x , z ,...,z ). 1 n 1 m ∪ 1 n 1 m A one-dimensional diagonal on a rectangle [a, b] is a non-decreasing sequence of points = D D (y0,y1,...,yn 1) with y0 = a and yn 1 b. For an element y = yi of the diagonal, we denote by − − ≤ y+ := yi+1 the successor of y. If i = n 1, then the successor is defined as b. Similarly, y := yi 1 − − − denotes the predecessor of y, and for i = 0, the predecessor is defined as a. The predecessor b of b − is defined as yn 1. − i i i i i i Given a rectangle = [a, b] and one-dimensional diagonals = (y0,y1,...,yn 1) on [a , b ] for i I D − ∈ [d] that are all of equal length # i = n, we define a d-dimensional diagonal on as the sequence D 1 d D I = (y0, y1,..., yn 1), where yj = (yj ,...,yj ) for j 0,...,n 1 . For an element y of the D − 1 d ∈ { − } 1 d diagonal, we define the successor as y+ := (y+,...,y+) and the predecessor as y := (y ,...,y ). − Similarly, we define b := (b1 ,...,bd ). The set of diagonals on is denoted by D = D( )=− D(a,−b). − − − I I There are three basic differences between a ladder and a diagonal. First, a higher-dimensional diagonal always consists of equally long one-dimensional diagonals, whereas the one-dimensional ladders of a higher-dimensional ladder need not be of the same length. Second, if we interpreted a diagonal as a ladder, we would not consider all the cells of this resulting ladder, but only the ones “lying on the diagonal”. This property gives diagonals their name. Finally, a one-dimensional ladder has pairwise different ladder points, while we require a one-dimensional diagonal only to be non-decreasing. This is the reason why we had to introduce sequence notation for the diagonals as opposed to set notation for ladders, because a diagonal can have the same point multiple times. We can now define the Arzelà-variation. Definition 3.1.5. The Arzelà-variation of a function f : R on is defined as I → I

~~VarA(f; ) := VarA(f; a, b) := sup ∆(f; y, y+) . I D y D∈ X∈D~~

If the rectangle is clear from the context, we also write Var (f) := Var (f; ). The function I A A I f is of bounded Arzelà-variation, if Var (f) < . We denote the set of all functions of bounded A ∞ Arzelà-variation on by . I A While the three variations introduced so far are defined using difference operators, the Hahn- and the Pierpont-variation are defined using oscillations. Both variations partition the rectangle I into subsets and then sum over all the oscillations of the function on those subsets. The only real difference is the partition chosen. First, we define the Hahn-variation, for which we need equidistant ladders. 3 Functions of multiple variables 63

Let [a, b] be a one-dimensional interval. For n N, we call the ladder Y the equidistant ladder ∈ En ∈ with n points, if # = n and if y y = (b a)/n holds for all y . En + − − ∈En Let = [a, b] bea d-dimensional interval and let j be the equidistant ladder on [aj, bj] for j [d]. I En ∈ Then = d j is the equidistant ladder with n points in . En j=1 En I DefinitionQ 3.1.6. The Hahn-variation of a function f : R on is defined as I → I osc (f) a b R VarH (f; ) := VarH (f; , ) := sup d 1 . I n N n − R ( n) ∈ ∈RXE If the rectangle is clear from the context, we also write Var (f) := Var (f; ). The function I H H I f is of bounded Hahn-variation, if Var (f) < . We denote the set of all functions of bounded H ∞ Hahn-variation on by . I H Finally, we introduce the Pierpont-variation, which relies on square nets. Let D > 0. Consider equidistant partitions of all d axes of Rd, where the distance between two neighbouring points is D (on all axes). Those partitions, similarly to ladders, naturally split Rd into congruent cubes. A square net is the set of all those cubes, regarded as closed regions. We S define by := D the side lengths of those cubes. For a rectangle with ℓ := min (bi ai), we |S| I i − denote by S the set of all square nets of Rd for which no side of a cube coincides with a side of I S and for which ℓ. I |S| ≤ The condition ℓ might seem very arbitrary, and it indeed is. It is apparent from the following |S| ≤ definition that we only need that is less than some fixed positive constant. In fact, the precise |S| choice of this constant does not affect whether a function is of bounded Pierpont-variation or not. However, we make this somewhat arbitrary restriction as it simplifies some proofs and statements later on. 3

~~2.4~~

~~1.8~~

~~1.2~~

~~0.6~~

~~0 0 0.6 1.2 1.8 2.4 3~~

~~Figure 3.1: The indicator function together with the cells of the ladder 5. Notice that 0,1,2 or 4 corners of the cells lie in [1, 2]2. E~~

Deﬁnition 3.1.7. The Pierpont-variation of a function f : R on is deﬁned as I → I d 1 VarP (f; ) := VarP (f; a, b) := sup − oscν(f). I SI ν |S| S∈ X∈S 3 Functions of multiple variables 64

If the rectangle is clear from the context, we also write Var (f) := Var (f; ). The function f I P P I is of bounded Pierpont-variation, if Var (f) < . We denote the set of all functions of bounded P ∞ Pierpont-variation on by . I P We prove in Section 3.5 that the Hahn- and the Pierpont-variation are actually equivalent. In view of this result, we focus mainly on the Hahn-variation and do not discuss certain results for the Pierpont-variation explicitly. We now give some examples of functions that are of bounded variation in one sense or another. Afterwards, we prove some basic statements that are useful in later chapters.

2 Example 3.1.8 (Indicator function of an axis-parallel box). Consider the function 1 2 : [0, 3] [1,2] → R. This function is of bounded variation in all the senses above. First, consider the Vitali-variation. Let be a ladder. Then either 0, 1, 2 or 4 corners of a Y cell R in ( ) lie in [1, 2]2 (see Figure 3.1). If 0 or 4 corners of R lie in [1, 2]2, then obviously R Y ∆[2](f; R) = 0. If 2 corners of R lie in [1, 2]2, then they are adjacent to each other and thus cancel each other in the diﬀerence operator and again ∆[2](f; R) = 0. Finally, if exactly 1 corner of R lies in [1, 2]2, then ∆[2](f; R) = 1. But this can happen at most 4 times, once at each corner of 2 2 [1, 2] . Moreover, for the ladder 2 = 0, 3/2 , this happens exactly 4 times. Hence, VarV (f) = 4. E Similarly, VarHK1(f) = VarHK0(f) = 4. For the Arzelà-variation, consider a diagonal . Notice that this diagonal traces a path from 0 D to 3 that is non-decreasing in every coordinate. This path can enter and leave [1, 2]2 at most 2 once (see Figure 3.2). When it enters or leaves [1, 2] in [y, y+], then ∆(f; y, y+) = 1, otherwise y y ∆(f; , +) = 0. Thus, VarA(f) = 2.

~~0 0 1 2 3~~

~~Figure 3.2: The indicator function together with the points of a diagonal and the path this diagonal traces.~~

For the Hahn-variation, take an equidistant ladder . Notice that the function only oscillates on En cells in ( ) that intersect the boundary of [1, 2]2. On those cells, the oscillation of the function R En is 1. How many such cells are there? Since the boundary consists of four horizontal or vertical “lines”, and since there are always n cells horizontally and vertically, there are at most 4n such cells. However, since the cells are closed (and thus, strictly speaking, do not form a partition of 3 Functions of multiple variables 65

~~[0, 3]2), there could be up to 8n cells intersecting the boundary of [1, 2]2. Therefore,~~

~~1 oscR(f) 8n VarH ( [1,2]2 ) = sup = 8, n N n ≤ n R ( n) ∈ ∈RXE and the function is also in . H This example can of course be generalized to arbitrary axis-parallel boxes in all dimensions.~~

Example 3.1.9 (Indicator functions of a rotated box). Next, we consider indicator functions of tilted boxes. The deﬁnitions of the variations we study all heavily depend on the cartesian coordinate system and are not rotation invariant. Indeed, indicator functions of most rotated (not axis-parallel) boxes are not of bounded Vitali-, Hardy-Krause- and Arzelà-variation. They are, however, of bounded Hahn-variation. Denote by A = (x,x): x [0, 1] the diagonal of [0, 1]2. We study the variations of the function ∈ 1 : [0, 1]2 R. While this is a very distorted rotated box, similar reasoning can be applied to A → most rotated boxes. First, consider the Vitali-variation and take the ladder for some n N. From Figure 3.3, it is En ∈ apparent that 2(n 1) cells of ( n) have exactly one corner in common with A. On each of those [2] − R E cells R, ∆ (1A; R) = 1. By taking n to inﬁnity, we see that 1A is of unbounded Vitali-variation. Hence, it is also of unbounded Hardy-Krause-variation.

~~0.8~~

~~0.6~~

~~0.4~~

~~0.2~~

~~0 0 0.2 0.4 0.6 0.8 1~~

~~Figure 3.3: The indicator function of the diagonal A together with the cells of the ladder 5. Notice that there are 2 4 cells with exactly one corner on A, these cells are colored light blue.E ·~~

~~For the Arzelà-variation, consider the diagonals~~

= (0, 0), (1/n, 0), (1/n, 1/n), (2/n, 1/n), (2/n, 2/n),..., (1, 1) . Dn Considering Figure 3.4 and taking n to inﬁnity, it is clear that 1A is of unbounded Arzelà-variation. For the Hahn-variation, let be an equidistant ladder. The function only oscillates on cells that En intersect A. It is easy to see that there can be at most 3n such cells. Hence, Var (1 ) 3. H A ≤ 3 Functions of multiple variables 66

~~0.8~~

~~0.6~~

~~0.4~~

~~0.2~~

~~0 0 0.2 0.4 0.6 0.8 1~~

~~Figure 3.4: The indicator function of A together with the points of the diagonal 5 and the path this diagonal traces. D~~

Example 3.1.10 (Monotone functions). As already discussed, there are multiple ways of defining monotonicity in higher dimensions. We need three different definitions of monotonicity, corresponding to the variations in the sense of Vitali, Hardy and Krause, and Arzelà. Since those definitions and the proofs that monotone functions are of bounded variation in the respective senses are rather lengthy and technical, we postpone them to later sections, especially Section 3.4 on the monotone decomposition of functions in , and . V HK A Example 3.1.11 (Lipschitz continuous functions). Lipschitz continuous functions are always of bounded Arzelà- and Hahn-variation, but they can be of unbounded Vitali- and Hardy-Krause- variation. For an example of a Lipschitz continuous function that is of unbounded Vitali-variation, and thus also of unbounded Hardy-Krause-variation, we refer to [7]. For the Arzelà-variation, it is easy to see that

~~d i i VarA(f)= sup f(y+) f(y) L sup y+ y 1 = L b a , D y − ≤ D y k − k i=1 | − | D∈ X∈D D∈ X∈D X~~

d d where L is the Lipschitz constant of f with respect to the ℓ1-norm on R . For the Hahn-variation, let L be the Lipschitz constant of f with respect to the ℓd -norm on Rd. Then, ∞ oscR(f) L supx,z R x z ∈ k − k∞ VarH (f) = sup d 1 sup d 1 . n N n − ≤ n N n − R ( n) R ( n) ∈ ∈RXE ∈ ∈RXE Since is the equidistant ladder with n points on , we have for all R ( ) that En I ∈ R En max bi ai ℓ sup x z = i | − | =: . x,z R k − k∞ n n ∈ 3 Functions of multiple variables 67

Since there are exactly nd cells in ( ), R En L supx,z R x z Lℓ ∈ k − k∞ sup d 1 sup d = Lℓ. n N n − ≤ n N n R ( n) R ( n) ∈ ∈RXE ∈ ∈RXE So far, the Vitali-variation and the Hardy-Krause-variation behaved very similarly. At first sight the purpose of the Hardy-Krause-variation is not clear. Why do we include the lower-dimensional Vitali-variations? In the following examples, we give two reasons. Example 3.1.12. The Vitali-variation is blind for lower-dimensional variations. Consider the function f : [0, 1]2 R defined by → 1, if x1 Q, f(x1,x2)= ∈ (0, if x1 / Q. ∈ The function f is independent of the variable x2, yet common sense indicates that it oscillates a lot. However, it is not hard to verify that VarV (f) = 0. The same holds true for every function that is independent of one of its variables. We treat this phenomenon more thoroughly in Chapter 3.9 on product functions. One very straight-forward way to avoid this problem is to just add the lower-dimensional Vitali-variations of the function, as the Hardy-Krause-variation does. And indeed, Var 0(f) = Var 1(f)= , so f is of unbounded Hardy-Krause-variation. HK HK ∞ For the second example, notice that the difference operator ∆[d] strongly resembles the difference quotient corresponding to ∂[d], the mixed partial derivative taken once in each coordinate. We even have the following proposition. For proofs, we refer to [20, 40]. Proposition 3.1.13. Let f : [a, b] R be a function such that ∂[d]f(x) exists for all x [a, b]. → ∈ Then ∂[d]f(x)dx =∆[d](f; a, b). Z[a,b] Furthermore, Var (f; a, b) ∂[d]f(x) dx. (3.29) V ≤ Z[a,b]

If ∂[d]f is continuous on [a, b], we have equality in (3.29) . Example 3.1.14. The Vitali-variation is blind for polynomials of low degree. For example, the function f : [0, 1]d R, f(x) = d xi satisﬁes Var (f)=0if d 2. This follows immediately → i=1 V ≥ from the preceding proposition. Furthermore, 0 < Var 1(f), Var 0(f) < , so the Hardy- P HK HK ∞ Krause-variation does not have the same weakness.

Now that we have seen some examples of functions of bounded variation of the various types, we prove that all the introduced multi-dimensional variations actually generalize the one-dimensional total variation. First, we need a technical lemma that will be used a lot throughout this thesis. Lemma 3.1.15. Let f :Ω R be a function. Let A, A ,...,A Ω with → 1 n ⊂ n A A . ⊂ i i[=1 Assume that for all i, j [n], there exists a sequence i , i , i ,...,i [n] with i = i and i = j ∈ { 0 1 2 k}⊂ 0 k and the property that A A = (l = 0,...,k 1). (3.30) il ∩ il+1 6 ∅ − 3 Functions of multiple variables 68

Then n osc (f) osc (f). A ≤ Ai Xi=1 Proof. First, we consider the case osc (f) = . Then, necessarily, f is unbounded on A. Since A ∞ A1,...,An is a finite collection and the union is a superset of A, f is unbounded on some Ai. Then osc (f)= , proving the claim. Ai ∞ Now, assume that osc (f) < . Let ε> 0 and let x,y A be such that f(x) f(y) osc (f) ε. A ∞ ∈ − ≥ A − Let i, j [n] be such that x A and y A . Let i ,...,i [n] be a sequence such that i = i, ∈ ∈ i ∈ j { 0 k}⊂ 0 i = j and A A = for l = 0,...,k 1. Without loss of generality, we can assume that k il ∩ il+1 6 ∅ − i = i for all l = m. We now define sequences (x ) and (y ) with x ,y A . First, x := x. Then, l 6 m 6 l l l l ∈ il 0 y is chosen as an arbitrary element in A A . Next, we define x := y . Finally, y := y. By l il ∩ il+1 l+1 l k construction, k f(x ) f(y ) = f(x ) f(y )= f(x) f(y). l − l 0 − k − Xl=0 Furthermore, f(xl) f(yl) oscA (f) − ≤ il holds trivially, since x ,y A . Altogether, l l ∈ il k k n oscA(f) f(x) f(y)+ ε = f(xl) f(yl) + ε oscA (f)+ ε oscA (f)+ ε. ≤ − − ≤ il ≤ i Xl=0 Xl=0 Xi=1 By taking the infimum over all ε> 0, we get the statement of the lemma.

Remark 3.1.16. We also refer to (3.30) as the path property. To see that this condition is necessary, take A ,...,A as a partition of A. Then f could be constant on all the A , while it still oscillates { 1 n} i on the whole set A. It should be noted, however, that we do not need the path property if oscA(f)= . This is apparent from the proof of the lemma. In general, one need not worry too much about ∞ the path property. Usually, all the sets A, A1,...,An are closed rectangles. Those closed rectangles already satisfy the path property, as their boundaries intersect.

~~Proposition 3.1.17. Let f : [a, b] R be a univariate function. Then →~~

Var(f) = VarA(f) = VarV (f) = VarHK1(f) = VarHK0(f) and Var (f) Var(f) 2 Var (f). H ≤ ≤ H Inequalities similar to the Hahn-variation hold for the Pierpont-variation. In particular, a function is of bounded total variation if and only if it is of bounded variation in any and all of the senses deﬁned above.

~~Proof. Since ∆[1](f; x,y)= f(y) f(x), the deﬁnitions of the Vitali-variation and the total variation − are equivalent and we have Var(f) = VarV (f). Furthermore,~~

~~u u u u VarHK1(f; a, b)= VarV f(.− ; b ); a− , b− = VarV (f; a, b) = Var(f; a, b). u([1] X~~

~~A similar equality holds for VarHK0(f; a, b). 3 Functions of multiple variables 69~~

The equivalence for the Hahn- and Pierpont-variations is a bit more tricky. First, consider the Hahn-variation. Let n N and let ε> 0. The set ( ) contains n cells, that we label R ,...,R . ∈ R En 1 n Take x ,y R such that k k ∈ k ε osc (f) f(x ) f(y ) + . Rk ≤ k − k n

~~Then = x ,y : k [n] [a, b) Y En ∪ k k ∈ ∩ deﬁnes a ladder on [a, b]. Furthermore, osc (f) f(y ) f(y) + ε. R ≤ + − R ( n) y ∈RXE X∈Y~~

Since ε> 0 was arbitrary, we even have osc (f) Var(f). R ≤ R ( n) ∈RXE Therefore, also Var (f) Var(f). H ≤ On the other hand, let be a ladder on [a, b]. Deﬁne δ := min y y : y and choose n δ 1. Y + − ∈Y ≥ − Then every rectangle in ( ) intersects at most two rectangles of ( ), while every rectangle in R En R Y ( ) is of course covered by rectangles in ( ). Since the path property (3.30) is fulﬁlled we can R Y R En apply Lemma 3.1.15 to all the rectangles in ( ) and get R Y f(y ) f(y) osc (f) 2 osc (f) 2 Var (f). + − ≤ R ≤ R ≤ H y R ( ) R ( n) X∈Y ∈RXY ∈RXE

Therefore, also Var(f) 2 Var (f). ≤ H The bounds for the Pierpont-variation can be deduced analogously to the bounds for the Hahn- variation with additional constants. Alternatively, they follow immediately from the proof of The- orem 3.5.1.

Finally, we show that ﬁner ladders and diagonals capture more of the Vitali- and Arzelà-variation, respectively. Proposition 3.1.18. Let f : R be a function. Then the following statements hold. I → 1. If , Y( ) with , then Y1 Y2 ∈ I Y1 ⊂Y2 ∆[d](f; y, y ) ∆[d](f; y, y ) . + ≤ + y 1 y 2 X∈Y X∈Y

~~2. If , D( ) with , then D1 D2 ∈ I D1 ⊂ D2 ∆(f; y, y ) ∆(f; y, y ) . + ≤ + y 1 y 2 X∈D X∈D~~

i i i i i i j j Proof. 1. Assume that 2 1 = c for some i [d] and c (a , b ) and 2 = 1 for all j = i. Y \Y { } ∈i i i ∈ Y Y 6 i The general case follows by induction. Denote by c , c+ 1 the predecessor and successor of c in − ∈Y i, respectively. We get Y2 [d] [d] ∆ (f; y, y+) = ∆ (f; y, y+) − | y 1 −i i yi i X∈Y y X1 X1 ∈Y ∈Y [d] [d] i i i i = ∆ (f; y, y+) + ∆ (f; y− : c , y+− : c+) . | − −i −i yi i ci y X1 X1 − ∈Y ∈Y \{ } 3 Functions of multiple variables 70

~~Furthermore,~~

~~[d] i i i i v i i v i i v ∆ (f; y− : c , y+− : c+) = ( 1)| |f (y− : c ) : (y+− : c+)− − − − v [d] X⊂~~

~~u i u i u i = ( 1)| |f (y− ) : (y− )− : c ) − + + u [d] i ⊂X\{ }~~

~~u i u i u i ( 1)| |f (y− ) : (y+− )− : c , − − − u [d] i ⊂X\{ }~~

~~where we split the sum into two parts, the ﬁrst of which represents the case that i / v and the ∈ second one that i v. By the triangle inequality, ∈~~

~~u i u i u i u i u i u i ( 1)| |f (y− ) : (y+− )− : c+) ( 1)| |f (y− ) : (y+− )− : c − − − − u [d] i u [d] i ⊂X\{ } ⊂X\{ }~~

~~u i u i u i u i u i u i ( 1)| |f (y− ) : (y− )− : c ) ( 1)| |f (y− ) : (y− )− : c ≤ − + + − − + u [d] i u [d] i ⊂X\{ } ⊂X\{ }~~

~~u i u i u i u i u i u i + ( 1)| |f (y− ) : (y+− )− : c ) ( 1)| |f (y− ) : (y+− )− : c − − − − u [d] i u [d] i ⊂X\{ } ⊂X\{ }~~

v i i v i i v v i i v i i v = ( 1)| |f (y− : c ) : (y− : c+)− + ( 1)| |f (y− : c ) : (y− : c )− − − − v [d] v [d] X∈ X∈ [d] i i i i [d] i i i i = ∆ (f; y− : c , y+− : c+) + ∆ (f; y− : c , y+− : c ) . −

~~Therefore, we have shown that~~

~~[d] [d] [d] i i i i ∆ (f; y, y+) ∆ (f; y, y+) + ∆ (f; y− : c , y+− : c+) ≤ − | y 1 −i i yi i ci X∈Y y X1 X1 − ∈Y ∈Y \{ } [d] i i i i + ∆ (f; y− : c , y+− : c ) − ~~

~~[d] [d] = ∆ (f; y, y+) = ∆ (f; y, y+) , − | | −i i yi i y 2 y X1 X2 X∈Y ∈Y ∈Y which proves the ﬁrst point of the proposition.~~

~~2. Assume that 2 1 = c. The general case again follows by induction. Denote by c and c+ the D \D − predecessor and successor of c in . Then D2~~

~~∆(f; y, y+) = ∆(f; y, y+) + ∆(f; c , c+) − y 1 y 2 c− X∈D ∈DX\{ }~~

~~∆(f; y, y+) + ∆(f; c , c) + ∆(f; c, c+) ≤ − y 1 c− ∈DX\{ }~~

~~= ∆(f; y, y+) . y 2 X∈D~~

~~This proves the proposition. 3 Functions of multiple variables 71~~

~~3.2 The variation functions~~

Similarly to the one-dimensional case, we define the variation functions corresponding to the variations in the sense of Vitali, Hardy and Krause, and Arzelà, and we study their properties. We do not consider the variation functions in the sense of Hahn or Pierpont since they lack many natural properties (like monotonicity) and since functions in and in general have no monotone H P decomposition. Definition 3.2.1. Let f : R be a function and let x . Then we define the following I → ∈ I variation functions.

~~1. The Arzelà-variation function VarA,f of f is deﬁned as~~

~~VarA,f (x) := VarA(f; a, x).~~

~~2. The Vitali-variation function VarV,f of f is deﬁned as~~

~~VarV,f (x) := VarV (f; a, x).~~

~~3. The Hardy-Krause-variation function VarHK0,f of f is deﬁned as~~

~~VarHK0,f (x) := VarHK0(f; a, x).~~

We want to show that the variation functions are increasing. However, in the multi-dimensional setting there are many definitions of increasing functions. We use the following definitions. Definition 3.2.2. A function f : R is called I → 1. coordinate-wise increasing, if

~~x y = ∆j(f; x, y) 0 for j [d]. ≤ ⇒ ≥ ∈ 2. Vitali-increasing, if x y = ∆[d](f; x, y) 0. ≤ ⇒ ≥ 3. completely monotone, if~~

x y = ∆u(f; x, y) 0 for u [d]. ≤ ⇒ ≥ ⊂ Clearly, completely monotone functions are coordinate-wise increasing and Vitali-increasing. We show the following characterization of coordinate-wise increasing functions.

Lemma 3.2.3. A function f : R is coordinate-wise increasing if and only if for all x, y I → ∈ I with x y we have f(x) f(y), i.e. if and only if ∆(f; x, y) 0. ≤ ≤ ≥ Proof. If for all x, y with x y we have f(x) f(y), then f is clearly coordinate-wise ∈ I ≤ ≤ increasing, as we can choose x and y to only diﬀer in one coordinate. On the other hand, if f is coordinate-wise increasing and y = x + h, then

~~i i i i f(y)= f y− : y f y− : x . ≥ By inductively applying that f is coordinate-wise increasing in all coordinates, we get f(y) ≥ f(x). 3 Functions of multiple variables 72~~

We are now able to prove the following Proposition. We remark that Leonov proved a similar statement for the Hardy-Krause-variation function in [37], although he used a different definition for the Hardy-Krause variation. In particular, Leonov’s results do not carry over to our definition of the Hardy-Krause variation, or at least not trivially.

~~Proposition 3.2.4. Let f : R be a function and let c . I → ∈I 1. If f is of bounded Arzelà-variation, then~~

Var (f; a, b) Var (f; a, c) + Var (f; c, b). (3.31) A ≥ A A In particular, for a x z b, ≤ ≤ ≤ ∆ Var , x, z Var (f; x, z) 0, (3.32) A,f ≥ A ≥ and VarA,f is coordinate-wise increasing. 2. If f is of bounded Vitali-variation, then

v v v v VarV (f; a, b)= VarV f; a : c− , c : b− . (3.33) v [d] X⊂ In particular, for a x z b and s [d], ≤ ≤ ≤ ⊂ s s s ∆ Var , x, z = Var (f; a− : x , z) 0, (3.34) V,f V ≥ and VarV,f is completely monotone.

~~3. If f is of bounded Hardy-Krause-variation, then VarHK0,f is completely monotone.~~

Proof. 1. Let be a diagonal on [a, c] and let be a diagonal on [c, b]. Then the sequence D1 D2 := deﬁnes a diagonal on [a, b]. Furthermore, D D1 ∪ D2 f(y ) f(y) + f(y ) f(y) = f(y ) f(y) Var (f; a, b). + − + − + − ≤ A y 1 y 2 y X∈D X∈D X∈D

By taking the supremum over all and , we get inequality (3.31). Inequality (3.31) immediately D1 D2 implies inequality (3.32). It follows easily from Lemma 3.2.3 that the Arzelà-variation function is coordinate-wise increasing. 2. Let be a ladder on [a, b]. In Proposition 3.1.18 we have shown that we increase the variation Y captured by a ladder if we add additional points. Hence, we assume without loss of generality that c . Given a set v [d], the set ∈Y ⊂ v v v v := [a : c− , c : b− ) Yv Y∩ deﬁnes a ladder on v v v v [a : c− , c : b− ]. (3.35) Furthermore, the rectangles in (3.35) partition [a, b] (except for the boundaries). In particular, since c , every rectangle [y, y ] with y lies in exactly one of the rectangles (3.35). Therefore, ∈Y + ∈Y [d] [d] ∆ (f; y, y+) = ∆ (f; y, y+) . (3.36) y v [d] y v X∈Y X⊂ X∈Y

~~3 Functions of multiple variables 73~~

~~By taking the supremum over all ladders in Y[a, b] (with c in them), we get~~

~~v v v v Var (f; a, b) Var f; a : c− , c : b− . V ≤ V v [d] X⊂~~

For the reverse inequality, let be a ladder on (3.35) for v [d]. Without loss of generality (i.e. Yv ⊂ again by refining the ladders if necessary), we can assume that the coordinates of the ladders Yv align, i.e. that = Y Yv v [d] ⊂[ defines a ladder on [a, b]. Hence, we again have (3.36). By taking the supremum over all ladders for all v [d] (and using that a refinement of them defines a ladder on [a, b]), we have Yv ⊂ Y v v v v Var (f; a, b) Var f; a : c− , c : b− . V ≥ V v [d] X⊂ This proves equality (3.33). Next, we show (3.34). To this end, define the ladders j := aj,xj on [aj, zj ] for j s, j := aj Y { } ∈ Y { } for j / s and the ladder = d j on [a, z]. We can rewrite (3.33) as ∈ Y j=1 Y Q VarV,f (z) = VarV (f; a, z)= VarV (f; R), R ( ) ∈RXY and in particular,

v v v v VarV,f (y : y+− ) = VarV (f; a, y : y+− )= VarV (f; R) R ( ) ∈RXY − R [a,yv :y v] ⊂ + for all y and v [d]. ∈Y ⊂ Take R ( ). How often and with what sign does Var (f; R) appear in the sum ∈ R Y V s v v v ∆ VarV,f ; x, z = ( 1)| | VarV,f (x : z− )? v s − X⊂ By deﬁnition, R = [y, y+] for some y . Due to the simple structure of the ladder , we can s u s u ∈ Y s u s u Y write y = a− : a : x − for some u s. Furthermore, y+ = z− : x : z − . Therefore, we get a ⊂ s v s v summand VarV (f; R) from every VarV,f (z− : x : z − ) where

s u s u s v s v y = z− : x : z − z− : x : z − + ≤ v u . For ﬁxed u, this happens for every v u exactly once and with sign ( 1) . There are | | ⊂ − | | i subsets of u of size i. Hence, by the binomial theorem, u v | | u i u ( 1)| | VarV (f; R) = VarV (f; R) | | ( 1) = VarV (f; R)(1 1)| | = 0, v u − i=0 i ! − − X⊂ X if u = . Therefore, the summands VarV (f; R) cancel out if u = . Conversely, if u = , the 6 ∅ s s 6 ∅ ∅ corresponding cell is [a− : x , z], which appears in the sum exactly once and with a positive sign. Hence, we have shown (3.34). 3 Functions of multiple variables 74

3. Let s [d] and let a x z b. Using the notation f w to denote f restricted to the rectangle ⊂ ≤ ≤ ≤ [aw, bw] a w , it is easy to see that × { − } s v v v v v v ∆ VarHK0,f ; x, z = ( 1)| | VarHK0,f (x : z− )= ( 1)| | VarHK0(f; a, x : z− ) v s − v s − X⊂ X⊂ v u u u v ( u) ( v) ( u) = ( 1)| | Var f(.− ; a ); a− , x ∩ − : z − ∩ − − V v s u([d] X⊂ X v u u u v ( u) ( v) ( u) = ( 1)| | Var f(.− ; a ); a− , x ∩ − : z − ∩ − − V u([d] v s X X⊂ v v ( u) ( v) ( u) = ( 1)| | Var −u x ∩ − : z − ∩ − − V,f u([d] v s X X⊂ s u u = ∆ VarV,f −u ; x− , z− . u([d] X

For = u [d], it remains to show that ∅ 6 ⊂ s u u ∆ Var u ; x , z 0. V,f ≥ We distinguish two diﬀerent cases. First, if s u, then ⊂ s u u u u ( s) u s u ∆ Var u ; x , z = Var f ; a ∩ − : x ∩ , z 0 V,f V ≥ by equation (3.34). Second, if s u, then by Example 3.1.12, 6⊂ s u u ∆ VarV,f u ; x , z = 0. The statement of the proposition follows.

~~In the following lemma, Leonov again proved a similar statement for the Hardy-Krause-variation in [37].~~

~~Lemma 3.2.5. Let f : R be a function. I → 1. If f is coordinate-wise increasing, then~~

~~Var (f)= f(b) f(a), A − and f is of bounded Arzelà-variation.~~

~~2. If f is Vitali-increasing, then [d] VarV (f)=∆ (f; a, b) and f is of bounded Vitali-variation.~~

~~3. If f is completely monotone, then~~

~~Var 0(f)= f(b) f(a) HK − and f is of bounded Hardy-Krause-variation. 3 Functions of multiple variables 75~~

~~Proof. 1. The statement follows immediately from~~

f(y+) f(y) = f(y+) f(y)= f(b) f(a). y | − | y − − X∈D X∈D 2. Let be a ladder on . Then Y I [d] [d] s s s ∆ (f; y, y ) = ∆ (f; y, y )= ( 1)| |f(y : y− ). (3.37) | + | + − + y y y s [d] X∈Y X∈Y X∈Y X⊂ Let x be one of the points at which f is evaluated in the sum (3.37). Then there exists a partition of [d] into three sets u, v, w such that x = au : bw : xv with aj = xj = bj for all j v. Then u w v u v w 6 6 ∈ y = a : b : x and y ∪ : y+ = x. It follows that the point x appears in the sum (3.37) − ∈ Y u s exactly once for every subset s v and with sign ( 1)| ∪ |. This happens exactly when we pick the u (v s) w s ⊂ − ladder point y ∪ \ ∪ : y . − Assume that v = . By the binomial theorem, 6 ∅ v v | | v i 0 = ( 1 + 1)| | = | | ( 1) . − i ! − Xi=0 v Since there are |i| subsets of v with size i, we have v u s u s u | | v i ( 1)| ∪ |f(x) = ( 1)| |f(x) ( 1)| | = ( 1)| |f(y) | | ( 1) = 0. s v − − s v − − i=0 i ! − X⊂ X⊂ X Hence, the term f(x) in the sum (3.37) cancels out. Assume now that v = . Then x = au : bw and the point x appears in the sum (3.37) exactly once. ∅ u w u This happens when we pick y = a : b and with sign ( 1)| |. We conclude that − − [d] s s s u u u [d] ∆ (f; y, y ) = ( 1)| |f(y : y− )= ( 1)| |f(a : b− )=∆ (f; a, b). | + | − + − y y s [d] u [d] X∈Y X∈Y X⊂ X⊂ 3. By the deﬁnition of the Hardy-Krause-variation, u u u u VarHK0(f)= VarV f(.− ; a ); a− , b− . u([d] X u u Since f is completely monotone, f(.− ; a ) is Vitali-increasing. Hence, u u u u u u u u u VarV f(.− ; a ); a− , b− = ∆− f(.− ; a ); a− , b− u([d] u([d] X X v u ( u) v ( u) v = ( 1)| |f a : a − ∩ : b − − − u([d] v u X X⊂− v u u v u v = ( 1)| |f a− : a ∩ : b − − =u [d] v u ∅6 X⊂ X⊂ v ( u) v u v = ( 1)| |f a − ∪ : b − . − =u [d] v u ∅6 X⊂ X⊂ The substitution w = u v yields − v ( u) v u v u w ( u) (u w) w ( 1)| |f a − ∪ : b − = ( 1)| − |f a − ∪ − : b − − =u [d] v u =u [d] w u ∅6 X⊂ X⊂ ∅6 X⊂ X⊂ u w w w = ( 1)| |( 1)| |f a− : b . − − =u [d] w u ∅6 X⊂ X⊂ 3 Functions of multiple variables 76

~~Exchanging the order of summation, we get~~

u w w w u w w w ( 1)| |( 1)| |f a− : b = ( 1)| |( 1)| |f a− : b − − − − =u [d] w u w [d] =u [d] ∅6 X⊂ X⊂ X⊂ ∅6 wX⊂u ⊂ w w w u = ( 1)| |f a− : b ( 1)| |. − − w [d] =u [d] X⊂ ∅6 wX⊂u ⊂ We separate the outer sum into the cases w = , w = [d], and = w ( [d]. ∅ ∅ 6 w w w u u d d ( 1)| |f a− : b ( 1)| | = f(a) ( 1)| | + ( 1) f(b)( 1) − − − − − w [d] =u [d] =u [d] X⊂ ∅6 wX⊂u ∅6 X⊂ ⊂ w w w u + ( 1)| |f a− : b ( 1)| |. − − =w([d] u:w u [d] ∅6 X X⊂ ⊂ By the binomial theorem, it is easy to see that f(b) f(a) remains. − Next, we want to compare the continuity properties of the variation functions to their parent functions. Since the functions we consider are deﬁned on a subset of Rd, and since all the norms on Rd are equivalent, we can talk about Lipschitz and α-Hölder continuity without specifying the norm. However, the Lipschitz and α-Hölder constants may depend on the norm. To specify the norm, we write lipp(f) and lipp (f) with 1 p , if we use the norm α ≤ ≤∞ d 1/p x := x p k kp | i| Xi=1 for 1 p< or ≤ ∞ x := max xi k k∞ i [d] | | ∈ for p = . Furthermore, we say that a function is coordinate-wise (right-/left-)continuous, if it is ∞ (right-/left-)continuous in every coordinate.

~~Theorem 3.2.6. Let f : [a, b] R be a function. Then the following statements hold. →~~

~~1. The function f is of bounded Arzelà-variation if and only if VarA,f is of bounded Arzelà- variation. Moreover, in this case we have VarA(VarA,f ) = VarA(f).~~

~~2. If f is of bounded Arzelà-variation and VarA,f is coordinate-wise (right-/left-)continuous, then also f is coordinate-wise (right-/left-)continuous.~~

3. If f is of bounded Arzelà-variation and VarA,f is Lipschitz continuous, then also f is Lipschitz continuous. Moreover, in this case we have lip1(f) lip1(Var ). ≤ A,f 4. If f is of bounded Arzelà-variation and VarA,f is α-Hölder continuous, then also f is α-Hölder continuous. Moreover, in this case we have lip1 (f) 21 α lip1 (Var ). α ≤ − α A,f 5. The function f is of bounded Vitali-variation if and only if VarV,f is of bounded Vitali-variation if and only if VarV,f is of bounded Hardy-Krause-variation. Moreover, in this case we have VarV (VarV,f ) = VarV (f).

6. The function f is of bounded Hardy-Krause-variation if and only if VarHK0,f is of bounded Hardy-Krause variation. Moreover, in this case we have VarHK0(VarHK0,f ) = VarHK0(f). 3 Functions of multiple variables 77

~~Proof. 1. Let f be of bounded Arzelà-variation. Then VarA,f is coordinate-wise increasing by Proposition 3.2.4, and by Lemma 3.2.5 it satisﬁes~~

Var Var ; a, b = Var (b) Var (a) = Var (f; a, b). A A,f A,f − A,f A Conversely, if VarA,f is of bounded Arzelà-variation, then necessarily f must be of bounded Arzelà- variation as well, as otherwise VarA,f would be infinite somewhere, and hence not of bounded Arzelà-variation. Thus, we are back in the first implication and VarA(VarA,f ) = VarA(f). 2. Let f , let Var be coordinate-wise right-continuous at x and let h 0 be such that hi > 0 ∈A A,f ≥ for some i [d] and hj =0 for j [d] i . Then Proposition 3.2.4 implies that ∈ ∈ \{ } f(x + h) f(x) Var (f; x, x + h) Var (f; a, x + h) Var (f; a, x) | − |≤ A ≤ A − A = Var (x + h) Var (x), A,f − A,f which shows that f is also coordiantewise right-continuous at x. The proof for coordinate-wise left-continuous functions is completely analogous. Finally, left- and right-continuity is equivalent to continuity, which proves the statement. 3. Let Var be Lipschitz continuous and let x, z [a, b]. Define c [a, b] as the coordinate-wise A,f ∈ ∈ minimum of x and z, i.e. ci = min xi, zi for i [d]. Then by Proposition 3.2.4, { } ∈ f(z) f(x) f(z) f(c) + f(c) f(x) Var (f; c, z) + Var (f; c, x) − ≤ − − ≤ A A Var A(f; a, z) Var A(f; a, c) + VarA(f; a, x) VarA(f; a, c) ≤ − − = Var (z) Var (c) + Var (x) Var (c) A,f − A,f A,f − A,f lip1(Var ) z c + lip1(Var ) x c = lip1(Var ) z x . ≤ A,f k − k1 A,f k − k1 A,f k − k1

~~4. Assume that VarA,f is α-Hölder continuous with 0 <α< 1. Equivalently to the Lipschitz continuous case, we get~~

f(z) f(x) lip1 (Var ) z c α + x c α . − ≤ α A,f k − k1 k − k1 Since x xα is concave, Jensen’s inequality yields 7→ α 1 z c 1 + x c 1 1 α 1 α f(z) f(x) 2 lip (Var ) k − k k − k = 2 − lip (Var ) z x , α A,f 2 α A,f 1 − ≤ k − k proving the claim.

~~5. If f is of bounded Vitali-variation, then VarV,f is Vitali-increasing by Proposition 3.2.4. By Lemma 3.2.5, VarV,f is of bounded Vitali-variation and~~

~~[d] VarV (VarV,f )=∆ VarV,f ; a, b .~~

Again by Proposition 3.2.4, [d] ∆ VarV,f ; a, b = VarV (f; a, b). Conversely, if VarV,f is of bounded Vitali-variation, then necessarily f must be of bounded Vitali- variation as well, as otherwise VarV,f would be inﬁnite somewhere, and hence not of bounded Vitali-variation. Thus, we are back in the ﬁrst implication and VarV (VarV,f ) = VarV (f).

If f is of bounded Vitali-variation, then VarV,f is completely monotone by Proposition 3.2.4. By Lemma 3.2.5, VarV,f is of bounded Hardy-Krause-variation. Conversely, if VarV,f is of bounded Hardy-Krause-variation, it is also of bounded Vitali-variation. 3 Functions of multiple variables 78

~~6. If f is of bounded Hardy-Krause-variation, then VarHK0,f is completely monotone by Proposition 3.2.4. By Lemma 3.2.5, VarHK0,f is of bounded Hardy-Krause-variation and~~

Var 0(Var 0 ) = Var 0 (b) Var 0 (a) = Var 0(f). HK HK ,f HK ,f − HK ,f HK Conversely, if VarHK0,f is of bounded Hardy-Krause-variation, then necessarily f must be of bounded Hardy-Krause-variation as well, as otherwise VarHK0,f would be inﬁnite somewhere, and hence not of bounded Hardy-Krause-variation. Thus, we are back in the ﬁrst implication.

~~3.3 Closure properties~~

In this section, we show that the d-dimensional variations all satisfy triangle inequalities, making the respective function spaces vector spaces. We also show that , , and are subspaces of the HK H P A space of bounded function that are closed under multiplication and division. A similar statement does not hold for . V We prove that the function spaces we consider are vector spaces and that the variation functionals satisfy the triangle inequality. We remark that Aistleitner and Dick already proved a more general triangle inequality for the Hardy-Krause-variation in [2]. Proposition 3.3.1. The sets , , , and are vector spaces. We have the triangle inequalities A H V HK P Var (αf + βg) α Var (f)+ β Var (g) A ≤ | | A | | A Var (αf + βg) α Var (f)+ β Var (g) V ≤ | | V | | V Var (αf + βg) α Var (f)+ β Var (g) H ≤ | | H | | H Var 1(αf + βg) α Var 1(f)+ β Var 1(g) HK ≤ | | HK | | HK Var 0(αf + βg) α Var 0(f)+ β Var 0(g) HK ≤ | | HK | | HK Var (αf + βg) α Var (f)+ β Var (g). P ≤ | | P | | P Proof. We ﬁrst prove the statement for . Let f, g and α, β R. Then A ∈A ∈ VarA(αf + βg) = sup ∆(αf + βg; y, y+) = sup αf(y+)+ βg(y+) αf(y) βg(y) D y | | D y | − − | D∈ X∈D D∈ X∈D sup α f(y+) f(y) + β g(y+) g(y) α VarA(f)+ β VarA(g). ≤ D y | || − | | || − | ≤ | | | | D∈ X∈D Next, we prove the statement for . Let f, g and α, β R. Then V ∈V ∈ v v v VarV (αf + βg) = sup ( 1)| |(αf + βg)(y : y+− ) Y − Y∈ y v [d] X∈Y X⊂ v v v v v v sup α ( 1)| |f(y : y− ) + β ( 1)| |g(y : y− ) ≤ Y | | − − | | − − Y∈ y v [d] v [d] X∈Y X⊂ X⊂ v v v v v v α sup ( 1)| |f(y : y− ) + β sup ( 1)| |g(y : y− ) ≤ | | Y − − | | Y − − Y∈ y v [d] Y∈ y v [d] X∈Y X⊂ X∈Y X⊂ = α Var (f)+ β Var (g). | | V | | V For the triangle inequality of , ﬁrst notice that H oscΩ(αf + βg)= sup (αf + βg)(x) (αf + βg)(y) α sup f(x) f(y) + β sup g(x) g(y) x,y Ω − ≤ | | x,y Ω | − | | | x,y Ω | − | ∈ ∈ ∈

~~= α osc Ω(f)+ β oscΩ(g). | | | | 3 Functions of multiple variables 79~~

Therefore, for f, g and α, β R, we have ∈H ∈ oscν (αf + βg) α oscν(f)+ β oscν(g) VarH (αf + βg) = sup d 1 sup | | d 1| | n N n − ≤ n N n − ν ( n) ν ( n) ∈ ∈RXE ∈ ∈RXE oscν(f) oscν (g) α sup d 1 + β sup d 1 ≤ | | n N n − | | n N n − ν ( n) ν ( n) ∈ ∈RXE ∈ ∈RXE = α Var (f)+ β Var (g). | | H | | H The triangle inequalities for follow immediately from their deﬁnitions and the triangle inequality HK for . V Now, let f, g and let α, β R. Then ∈ P ∈ d 1 d 1 VarP (αf + βg)= sup − oscν(αf + βg) sup − α oscν (f)+ β oscν(g) SI ν |S| ≤ SI ν |S| | | | | S∈ X∈S S∈ X∈S d 1 d 1 α sup − oscν(f)+ β sup − oscν(g) ≤ | | SI ν |S| | | SI ν |S| S∈ X∈S S∈ X∈S = α Var (f)+ β Var (g). | | P | | P

~~The following Lemma was already by proved Blümlinger and Tichy in [10] for . HK Lemma 3.3.2. The vector spaces , , and are subspaces of , the space of bounded func- H P A HK B tions.~~

Proof. We prove the statement for . If a function f : R is unbounded, then osc (f) = . H I → I ∞ Thus, oscν(f) VarH (f) = sup d 1 osc (f)= , n N n − ≥ I ∞ ν ( n) ∈ ∈RXE where we lower bounded the supremum by setting n = 1. Thus, f is of unbounded Hahn-variation and not in . H The remaining statements are not much harder to prove, but we remark that they also follow immediately from Theorem 3.5.1.

The following proposition was already proved by Adams and Clarkson in [1] for , and for A P H dimension d = 2. We extend those results to arbitrary dimensions. The closure properties for were already studied by Hardy in [23] and Blümlinger in [9], and we omit the proofs. We HK also remark that is not closed under multiplication. Fréchet in [20] and Owen in [40] studied the V multiplicative closure properties of and showed that very strong conditions are required to deduce V that the product of two functions in is again in . We study those requirements more closely in V V Section 3.9.

~~Proposition 3.3.3. The vector spaces , , and are closed under multiplication and division A P H HK (if the denominator is bounded away from 0). Furthermore,~~

Var (fg) Var (f) Var (g)+ g(a) Var (f)+ f(a) Var (g) A ≤ A A | | A | | A VarH (fg) f VarH (g)+ g VarH (f) ≤ k k∞ k k∞ VarP (fg) f VarP (g)+ g VarP (f). ≤ k k∞ k k∞ 3 Functions of multiple variables 80

~~For the proof of this proposition, we need the following lemma.~~

Lemma 3.3.4. Let f, g :Ω R be functions and denote by h := supx Ω h(x) the supremum → k k∞ ∈ | | norm on Ω. Then oscΩ(fg) f oscΩ(g)+ g oscΩ(f). ≤ k k∞ k k∞ Proof. For all x,y Ω we have ∈ f(x)g(x) f(y)g(y) f(x)g(x) f(x)g(y) + f(x)g(y) f(y)g(y) | − | ≤ | − | | − | f g(x) g(y) + g f(x) f(y) . ≤ k k∞| − | k k∞| − | By taking the supremum over x,y Ω on both sides, we have ∈ osc(fg) f osc(g)+ g osc(f). ≤ k k∞ k k∞

Proof of Proposition 3.3.3. The proof for the statement of is given after the proof of Theorem A 3.4.1. For f, g , we have with Lemma 3.3.4 that ∈H oscν(fg) f oscν(g)+ g oscν(f) ∞ ∞ VarH (fg) = sup d 1 sup k k d 1k k n N n − ≤ n N n − ν ( n) ν ( n) ∈ ∈RXE ∈ ∈RXE oscν(g) oscn u(f) f sup d 1 + g sup d 1 ≤ k k∞ n N n − k k∞ n N n − ν ( n) ν ( n) ∈ ∈RXE ∈ ∈RXE = f VarH (g)+ g VarH (f). k k∞ k k∞

Since Lemma 3.3.2 implies that f and g are ﬁnite, also VarH (fg) < . k k∞ k k∞ ∞ To show that is closed under division, it remains to show that f with f C > 0 implies H ∈ H | | ≥ 1/f . It is easy to see that for all Ω , ∈H ⊂I 1 1 1 f(y) f(x) f(y) f(x) osc (f) osc = sup = sup − sup | − | = Ω . Ω x y x y 2 2 f x,y Ω f( ) − f( ) x,y Ω f( )f( ) ≤ x,y Ω C C ∈ ∈ ∈

~~Hence,~~

oscν(1/f) oscν (f) 2 VarH (1/f) = sup d 1 sup 2 d 1 = C− VarH (f) < . n N n − ≤ n N C n − ∞ ν ( n) ν ( n) ∈ ∈RXE ∈ ∈RXE The proof for follows the proof for closely. However, the statement of the proposition also P H follows immediately from Theorem 3.5.1, so we omit the proof.

~~3.4 Decompositions into monotone functions~~

In this section, we show that there exist monotone decompositions of the functions in , and A V HK similar to the decomposition in Theorem 2.3.2. We ﬁrst prove the existence of a monotone decomposition for functions in . According to Adams A and Clarkson in [1], this decomposition was proved by Arzelà in [6] in 1904, at least for dimension d = 2. Due to the age of the paper, however, the author has been unable to ﬁnd it. In any case, we prove this decomposition for arbitrary dimensions. 3 Functions of multiple variables 81

Theorem 3.4.1. A function f : R is of bounded Arzelà-variation, if and only if it can be I → + written as the diﬀerence of two coordinate-wise increasing functions f and f −. If f is of bounded Arzelà-variation, those two functions can be chosen as 1 f +(x)= (Var (x) f(x)+ f(a)) and 2 A,f − 1 f −(x)= (Var (x)+ f(x)+ f(a)). 2 A,f We call those functions the Jordan decomposition of f. For this decomposition, we also have

~~+ + VarA(f) = VarA(f + f −) = VarA(f ) + VarA(f −) = VarA(VarA,f ).~~

Proof. Let f + and f be coordinate-wise increasing. By Lemma 3.2.5, f +,f . Since is a − − ∈ A A vector space by Proposition 3.3.1, also f + f . − − ∈A + To prove the converse, let f and deﬁne f and f − as in the theorem. We have to show + ∈ A that the functions f and f − are coordinate-wise increasing. We show that the function g(x) := Var (x) f(x) is coordinate-wise increasing, the condition for the functions f + and f follows A,f − − immediately. Let x , x with x = x + hj. Proposition 3.2.4 implies that 1 2 ∈I 2 1 g(x ) g(x ) = Var (x ) f(x ) (Var (x ) f(x )) 2 − 1 A,f 2 − 2 − A,f 1 − 1 Var (x )+ f(x ) f(x ) f(x ) Var (x )+ f(x ) ≥ A,f 1 | 2 − 1 |− 2 − A,f 1 1 = f(x ) f(x ) (f(x ) f(x )) 0. | 2 − 1 |− 2 − 1 ≥ Therefore, g is coordinate-wise increasing. + + Finally, since f and f − are coordinate-wise increasing, also f + f − is coordinate-wise increasing and Lemma 3.2.5 yields

+ + + VarA(VarA,f ) = VarA(f + f −) = (f + f −)(b) (f + f −)(a) + + − + = f (b) f (a) + f −(b) f −(a) = Var (f ) + Var (f −). − − A A The remaining equality follows from Theorem 3.2.6.

We can now prove Proposition 3.3.3 for . A Proof of Proposition 3.3.3 for . First, let f and g be non-negative coordinate-wise increasing func- A tions. Then obviously, fg is also a non-negative coordinate-wise increasing function. Thus fg ∈A and Var (fg)= f(b)g(b) f(a)g(a) by Lemma 3.2.5. A − Now, let f and g be coordinate-wise increasing functions. Then f f(a) and g g(a) are non- − − negative coordinate-wise increasing functions and by Proposition 3.3.1,

Var (fg) = Var (f f(a))(g g(a)) + g(a)f + f(a)g f(a)g(a) A A − − − Var (f f(a))(g g(a)) + g(a) Var (f)+ f(a) Var (g) ≤ A − − | | A | | A = (f(b) f(a))(g(b) g(a)) + g(a) Var (f)+ f(a) Var (g) − − | | A | | A = Var (f) Var (g)+ g(a) Var (f)+ f(a) Var (g). A A | | A | | A 3 Functions of multiple variables 82

Let f, g and let (f +,f ) and (g+, g ) be their Jordan decompositions. By Theorem 3.4.1, ∈A − − + + Var (fg) = Var (f f −)(g g−) A A − − + + + + = Var f g f g− f −g + f −g− A − − + + + + VarA(f g ) + VarA(f g−) + VarA(f −g ) + VarA(f −g−) ≤ + + + + + + VarA(f ) VarA(g )+ g (a) VarA(f )+ f (a) VarA(g ) ≤ + | | + | + | + VarA(f ) VarA(g−)+ g−(a) VarA(f )+ f (a) VarA(g−) + | + | | | + + Var (f −) Var (g )+ g (a) Var (f −)+ f −(a) Var (g ) A A | | A | | A + Var (f −) Var (g−)+ g−(a) Var (f −)+ f −(a) Var (g−) A A | | A | | A + + = VarA(f ) + VarA(f −) VarA(g ) + VarA(g−)

+ + + Var (f ) + Var (f −) g (a) + g−(a) A A | | | | + + + f (a) + f −(a) Var (g ) + Var (g−) | | | | A A = Var (f) Var (g)+ g(a) Var (f)+ f(a) Var (g). A A | | A | | A For the closedness under division, it suﬃces to show that if f such that f C > 0, then ∈ A | | ≥ 1/f . This follows from ∈A 1 1 f(y) f(y ) Var (1/f) = sup = sup − + A y y y y D y f( +) − f( ) D y f( )f( +) D∈ X∈D D∈ X∈D 1 1 y y 2 sup f( +) f( ) = VarA(f) < . ≤ C D y | − | C ∞ D∈ X∈D

We also have a monotone decomposition for functions in . According to Adams and Clarkson in V [1], this decomposition was proved in the book by Hobson in [27] in 1927, at least for dimension d = 2. Due to the age of the book, however, the author was unable to ﬁnd a version of this edition. In any case, we give a proof of this decomposition for arbitrary dimensions.

Theorem 3.4.2. A function f : R is of bounded Vitali-variation, if and only if it can be written I → + as the diﬀerence of two Vitali-increasing functions f and f −. If f is of bounded Vitali-variation, those two functions can be chosen as 1 f +(x)= (Var (x) f(x)+ f(a)) and 2 V,f − 1 f −(x)= (Var (x)+ f(x)+ f(a)). 2 V,f We call those functions the Jordan decomposition of f. For this decomposition, we also have

~~+ + VarV (f) = VarV (f + f −) = VarV (f ) + VarV (f −) = VarV (VarV,f ).~~

Proof. If f +,f are Vitali-increasing functions, then by Lemma 3.2.5, f +,f . Using Proposi- − − ∈V tion 3.3.1, we also have that f + f . − − ∈V + To prove the other implication, assume that f is of bounded Vitali-variation and let f and f − + be as in the statement of the theorem. It remains to show that f and f − are Vitali-increasing. 3 Functions of multiple variables 83

We show that g(x) = Var (x) f(x) is Vitali-increasing, the statement for f + and f follows V,f − − immediately. Let x , x with x x . Then 1 2 ∈I 1 ≤ 2 [d] [d] [d] ∆ (g; x1, x2)=∆ VarV,f ; x1, x2 +∆ (f; x1, x2). Since VarV,f is Vitali-increasing by Proposition 3.2.4, Lemma 3.2.5 implies that ∆[d](g; x , x ) = Var (f; x , x ) ∆[d](f; x , x ). 1 2 V 1 2 − 1 2 Finally, notice with the trivial ladder = x that Y { 1} [d] [d] [d] ∆ (g; x1, x2) = VarV (f; x1, x2) ∆ (f; x1, x2) VarV (f; x1, x2) ∆ (f; y+, y) , − ≥ − y X∈Y which is of course non-negative.

We also have a monotone decomposition of functions in . The ﬁrst to state such a decomposition HK explicitly was Leonov in [37]. We state a similar decomposition due to Aistleitner and Dick in [2], who also decomposed functions in into their positive and negative variations. HK Theorem 3.4.3. A function f : [0, 1]d R is of bounded Hardy-Krause-variation, if and only if it → + can be written as the diﬀerence of two completely monotone functions f and f −. If f is of bounded Hardy-Krause-variation, those two functions can be chosen as

+ 1 f (x)= (Var 0 (x) f(x)+ f(a)) and 2 HK ,f − 1 f −(x)= (Var 0 (x)+ f(x)+ f(a)). 2 HK ,f We call those functions the Jordan decomposition of f. For this decomposition, we also have + + VarHK0(f) = VarHK0(f + f −) = VarHK0(f ) + VarHK0(f −) = VarHK0(VarV,f ).

~~A similar statement can be obtained for Var 1 by using that for g(x)= f(1 x), x [0, 1]d, we HK − ∈ have VarHK1(f) = VarHK0(g).~~

~~3.5 Inclusions~~

Having introduced many diﬀerent kinds of variation, it is a natural question to ask about their relations. We prove the following theorem. Theorem 3.5.1. The following inclusions between the diﬀerent classes of functions of bounded variation hold. (3.38) HK⊂V

= (3.39) HK⊂A⊂P H Remark 3.5.2. The inclusion (3.38) is trivial and does not require a proof. The equality of and P was already proved by Clarkson and Adams in [13] for dimension d = 2. We show this equality H for arbitrary dimensions. The ﬁrst inclusion of (3.39) was proved by Hobson in [27], although again only in dimension d = 2. We also extend this result to arbitrary dimensions. The second inclusion of (3.39) was proved by Hahn in [22], already for arbitrary dimensions. In dimension d = 2, Clarkson and Adams were able to prove in [13] that = = . We were not able to reproduce HK A∩V P∩V this result in arbitrary dimensions, although we have also found no counterexamples. We leave this as an open problem. 3 Functions of multiple variables 84

Proof of Theorem 3.5.1. The inclusion in (3.38) is trivial. First, we prove that . By HK ⊂ A Theorem 3.4.3, f can be written as the diﬀerence of two completely monotone functions + ∈ HK f ,f −. Since those functions are completely monotone, they are also coordinate-wise increasing. By Theorem 3.4.1, f is in . A Next, we prove that = . Let f : R be of bounded Pierpont-variation. Let n N and P H I → ∈ consider the ladder . Denote by ℓ := min (bi ai) and L := max (bi ai) the minimal and En i − i − maximal side length of , respectively. Let n S with n = ℓ/n and let R ( n). Then I S ∈ I |S | ∈ R E there are at most (L/ℓ + 2)d cubes ν needed to cover R. Conversely, every cube ν has a ∈ Sn ∈ Sn non-empty intersection with at most 2d rectangles R ( ). Furthermore, we have 1/n = /ℓ. ∈ R En |Sn| The sets in satisfy the path property (3.30), since they are closed rectangles. Thus, we can apply Sn Lemma 3.1.15 and get

oscR(f) d d 1 (d 1) VarH (f) = sup d 1 sup 2 oscν (f) n − ℓ− − n N n − ≤ n N |S | R ( n) ν n ∈ ∈RXE ∈ X∈S d d 1 (d 1) d (d 1) sup 2 oscν(f) − ℓ− − = 2 ℓ− − VarP (f) < . ≤ SI ν |S| ∞ S∈ X∈S i i Now, let f be of bounded Hahn-variation. Let S and again define ℓ := mini(b a ) and S ∈ I − L := max (bi ai). Choose n as the smallest integer larger than L/ and consider . We may i − |S| En assume without loss of generality that n 2L/ . Since L/n , every R ( ) has a non- ≤ |S| ≤ |S| ∈ R En empty intersection with at most 2d cubes ν . Conversely, every ν (intersected with ) is ∈ S ∈ S I covered by (at most (2L/ℓ + 2)d) rectangles R ( ). Furthermore, 2L/n. Again, the path ∈ R En |S| ≤ property (3.30) is fulfilled, since the cells in ( ) are closed. Thus, we can apply Lemma 3.1.15 R En and get d 1 d 1 2L − d VarP (f)= sup − oscν(f) sup 2 oscR(f) SI |S| ≤ SI n ν R ( n) S∈ X∈S S∈ ∈RXE 2d 1 d 1 oscR(f) 2d d 1 2 − L − sup d 1 = 2 L − VarH (f) < . ≤ n N n − ∞ R ( n) ∈ ∈RXE This shows that = . P H Finally, we prove that . By Theorem 3.4.1 and Proposition 3.3.1, it is sufficient to show A⊂H that coordinate-wise increasing functions are in . Let f : R be a coordinate-wise increasing H I → function and let n N. Clearly, ∈ oscR(f) osc f; [y, y+] f(y+) f(y) d 1 = d 1 = d−1 . (3.40) n − n − n − R ( n) y n y n ∈RXE X∈E X∈E It is easy to see that all the terms f(x) where x lies in the interior of cancel out. Only the I terms f(x) where x lies on the boundary of remain. The rectangle has 2d different (d 1)- I d 1 I − dimensional faces. On each of those faces there are (n + 1) − different points on which we evaluate d 1 f in (3.40). Therefore, at most 2d(n + 1) − summands of the sum remain. Since f is bounded by M := max f(a) , f(b) , we have | | | | f(y ) f(y) 2d(n + 1)d 1M + − − . nd 1 ≤ nd 1 y n − − X∈E Hence, d 1 oscR(f) 2d(n + 1) − M d VarH (f) = sup d 1 sup d 1 = 2 dM < . n N n − ≤ n N n − ∞ R ( n) ∈ ∈RXE ∈ 3 Functions of multiple variables 85

~~3.6 Continuity, diﬀerentiability and measurability~~

In the one-dimensional case, functions of bounded total variation have at most countably many discontinuities, are measurable and diﬀerentiable almost everywhere. Those are useful properties to work with. Unfortunately, similar statements often do not hold in higher dimensions. The continuity properties of functions in are especially weak. We refer the interested reader V to [1] for some exceptionally weak statements in dimension d = 2. The authors of [1] also noted that one cannot expect much more, as contains functions that are everywhere discontinuous (also V everywhere coordinate-wise discontinuous with respect to every coordinate) and that are not even Lebesgue-measurable. The construction of such examples is straight-forward. Let f : [0, 1] R be → a one-dimensional function that is everywhere discontinuous and not Lebesgue-measurable. Using Example 3.1.12, we see that g : [0, 1]2 R, g(x1,x2)= f(x1)+ f(x2) has the required properties. → Functions of bounded Hahn-variation exhibit better regularity properties than functions of bounded Vitali-variation. The following theorem was already proved by Adams and Clarkson in [1] for dimension d = 2, we extend it to arbitrary dimensions.

Theorem 3.6.1. Functions in are continuous almost everywhere. H Proof. Assume that the function f : R is discontinuous on the set A, which has positive I → outer Lebesgue measure. Deﬁne A to be the set of all x A such that osc (f) 1/m for all m ∈ U ≥ neighbourhoods U of x. Clearly, ∞ Am = A, m[=1 so there exists an m N such that A has positive outer Lebesgue measure. Denote the outer ∈ m Lebesgue measure of Am by ε> 0. The ladder splits into nd cells of equal size λ( )/nd. Hence, A intersects at least εnd/λ( ) En I I m I of those cells. On each of those cells R, we have osc (f) 1/m. Altogether, R ≥ d oscR(f) εn 1/m εn VarH (f) = sup d 1 sup d 1 = sup = . n N n − ≥ n N λ( ) n − n N mλ( ) ∞ R ( n) ∈ ∈RXE ∈ I ∈ I Therefore, f is of unbounded Hahn-variation.

Corollary 3.6.2. Functions in are Lebesgue-measurable. H Proof. By Theorem 3.6.1, functions in are continuous almost everywhere. Functions that are H continuous almost everywhere are Lebesgue-measurable.

By Theorem 3.5.1, functions in are also in . Thus, functions of bounded Arzelà-variation H A are continuous almost everywhere. It turns out that functions in are even differentiable almost A everywhere. This was first proved by Burkill and Haslam-Jones in [11] for dimension d = 2, we generalize it to arbitrary dimension. For the proof, we use Stepanov’s Theorem. It states that functions that are Lipschitz continuous almost everywhere are also differentiable almost everywhere. Therefore, we first prove Stepanov’s Theorem and then show that functions of bounded Arzelà- variation are Lipschitz continuous almost everywhere. The proof of Stepanov’s Theorem is split into multiple steps. First, we prove a theorem on extending Lipschitz continuous functions in a Lipschitz continuous way. We need the following lemma. 3 Functions of multiple variables 86

~~Lemma 3.6.3. Let f : i I be a collection of Lipschitz continuous functions with Lipschitz { i ∈ } constant L and f : A R with A Rd. Then the functions i → ⊂~~

~~x inf fi(x), x A and 7→ i I ∈ ∈ x sup fi(x), x A 7→ i I ∈ ∈ are Lipschitz continuous with Lipschitz constant L, if they are ﬁnite at one point.~~

~~Proof. We prove the lemma for the infimum, the lemma for the supremum is proved analogously. If the function is finite at some point x A, then it is finite everywhere, since 0 ∈~~

inf fi(x) inf fi(x0) L x x0 > i I ≥ i I − | − | −∞ ∈ ∈ for all x A. ∈ Notice that inf fi(x)+ L x z inf fi(z) inf fi(x) L x z . i I | − |≥ i I ≥ i I − | − | ∈ ∈ ∈ Hence, inf fi(z) inf fi(x) L x z , i I − i I ≤ | − | ∈ ∈ implying that the function is Lipschitz continuous with Lipschitz constant L.

~~The following extension theorem is due to Kirszbraun in [32].~~

Theorem 3.6.4 (Kirszbraun’s Extension Theorem). Let A Rd and let f : A R be a Lipschitz ⊂ → continuous function with Lipschitz constant L. Then there exists a Lipschitz continuous function F : Rd R with F (x)= f(x) for x A. → ∈ d Proof. Because the functions fa : R R given by →

~~fa(x) := f(a)+ L x a , a A | − | ∈ are Lipschitz continuous on Rd with Lipschitz constant L, the function~~

F (x) := inf fa(x) a A ∈ is Lipschitz continuous on Rd with Lipschitz constant L by Lemma 3.6.3. It is obvious that F (x)= f(x) for all x A. ∈ Next, we prove a theorem due to Rademacher that tells us that Lipschitz continuous functions are diﬀerentiable almost everywhere. As a preparation, we need a simple technical lemma.

Lemma 3.6.5. Let f : Rd R be a Lipschitz continuous function, let v Rd and define → ∈ f(x + tv) f(x) Dvf(x) := lim − , t 0 t → whenever this limit exists and is finite. Then the set A on which Dvf(x) is defined is measurable. 3 Functions of multiple variables 87

Proof. First, since f is Lipschitz continuous, the limit is always ﬁnite if it exists. Second, we can write f(x + tv) f(x) f(x + tv) f(x) A = x Rd : lim inf − = limsup − ∈ t 0 t t 0 t → → f(x + tv) f(x) f(x + tv) f(x) = x Rd : sup inf − = inf sup − 0 ∈ t0>0 t 0 t

f(x + tv) f(x) x − 7→ t is measurable. Since countable suprema and inﬁma over measurable functions yield measurable functions, both f and f are measurable. Hence, f f is measurable and A = (f f ) 1( 0 ) 1 2 1 − 2 1 − 2 − { } is the preimage of a measurable set under a measurable function and thus measurable.

Recall that if f and g are absolutely continuous, also fg is absolutely continuous, and (fg)′ = f ′g + fg′. Using this, we prove the partial integration formula for absolutely continuous functions. Proposition 3.6.6. Let f, g : [a, b] R be absolutely continuous functions. Then → b b f(x)g′(x)dx = f(b)g(b) f(a)g(a) f ′(x)g(x)dx. − − Za Za Proof. The statement of the proposition is equivalent to

~~b (fg)′(x)dx = (fg)(b) (fg)(a), − Za which immediately follows from Lebesgue’s Theorem 2.4.10.~~

~~The following theorem was ﬁrst proved by Rademacher in [43].~~

~~Theorem 3.6.7 (Rademacher). Let Ω Rd be open and let f :Ω R be Lipschitz continuous. ⊂ → Then f is diﬀerentiable almost everywhere in Ω.~~

Proof. Due to Kirszbraun’s Extension Theorem 3.6.4, we can assume without loss of generality that f : Rd R is Lipschitz continuous. The proof is divided into three parts. First, we show that → the partial derivatives of f exist almost everywhere. This enables us to deﬁne the formal gradient. Next, we show that the directional derivatives exist almost everywhere and are given in terms of the gradient. Finally, we show that also the total derivative exists almost everywhere. First, let x, v Rd. Then the function ∈

fx,v(t) := f(x + tv) 3 Functions of multiple variables 88 is a one-dimensional Lipschitz continuous function. By Example 2.1.6, this function is of bounded variation on every compact interval and by Theorem 2.4.6, it is differentiable almost everywhere. Now, keep v Rd 0 fixed and consider the function ∈ \{ } f(x + tv) f(x) Dvf(x) := lim − . t 0 t → Lemma 3.6.5 implies that the set where the above limit exists and is finite is measurable. Assume that the set A of points x Rd where the limit does not exist has positive measure. Then there ∈ exists a bounded rotated cuboid Q with one side parallel to v such that B := A Q has positive ∩ measure. Let E be the projection of Q to some hyperplane normal to v. Denoting by λn the n-dimensional Lebesgue measure, we have

d d 1 1 0 < λ (B) λ − (E) sup λ t R: Dvf(x + tv) is not defined ≤ x E ∈ ∈ d 1 1 d 1 = λ − (E) sup λ t R: fx,v is not differentiable at t = λ − (E) 0 = 0, x E ∈ · ∈ which is a contradiction. Hence, Dvf is defined almost everywhere. In particular, ∂f ∂f f := ,..., = D 1 f,...,D d f ∂x1 ∂xd e e ∇ exists almost everywhere, where ei is the i-th unit vector. Second, let v Rd. We show that ∈ Dvf = v f (3.41) ·∇ holds almost everywhere. Let ϕ (Rd), the space of infinitely differentiable functions with ∈ C0∞ compact support. Then, f(x + tv) f(x) Dvf(x)ϕ(x)dx = lim − ϕ(x)dx. d d t 0 t ZR ZR → Since f is Lipschitz continuous and ϕ is continuous and has bounded support, we can apply the dominated convergence theorem and get with a simple change of variables f(x + tv) f(x) f(x + tv) f(x) lim − ϕ(x)dx = lim − ϕ(x)dx d t 0 t t 0 d t ZR → → ZR f(x + tv) f(x) = lim ϕ(x)dx ϕ(x)dx t 0 d t − d t → ZR ZR f(x) f(x) = lim ϕ(x tv)dx ϕ(x)dx t 0 d t − − d t → ZR ZR ϕ(x) ϕ(x tv) = lim f(x) − − dx. − t 0 d t → ZR Using the continuity of f and that ϕ′ is continuous and has bounded support, we can again apply the dominated convergence theorem to get ϕ(x) ϕ(x tv) ϕ(x) ϕ(x tv) lim f(x) − − dx = f(x) lim − − dx = f(x)Dvϕ(x)dx. − t 0 d t − d t 0 t − d → ZR ZR → ZR Since Dvϕ = v ϕ holds, we get ·∇ d ∂ϕ x x x i x x x f( )Dvϕ( )d = v f( ) i ( )d . − Rd − Rd ∂x Z Xi=1 Z 3 Functions of multiple variables 89

i d 1 i i Let i [d] and x− R − be ﬁxed. The support of ϕ(. : x− ) is contained in some compact ∈ ∈ i i ∂ϕ i i interval, say [a, b]. Since both f(. ; x− ) and ∂xi (. ; x− ) are absolutely continuous, we have with Proposition 3.6.6 that ∂ϕ b ∂ϕ f(x) (x)dxi = f(x) (x)dxi ∂xi ∂xi ZR Za b i i i i ∂f = f(b : x− )ϕ(b : x− ) f(a : x− )ϕ(a : x− ) (x)ϕ(x)dx − − ∂xi Za ∂f = (x)ϕ(x)dx. − ∂xi ZR Using Fubini’s theorem,

d ∂ϕ d ∂ϕ i x x x i x x i x i v f( ) i ( )d = v f( ) i ( )dx d − − Rd ∂x − Rd−1 R ∂x Xi=1 Z Xi=1 Z Z d ∂f d ∂f i x x i x i i x x x = v i ( )ϕ( )dx d − = v i ( )ϕ( )d Rd−1 R ∂x Rd ∂x Xi=1 Z Z Xi=1 Z = v f(x)ϕ(x)dx. d ·∇ ZR We have thus shown that

Dvf(x)ϕ(x)dx = v f(x)ϕ(x)dx d d ·∇ ZR ZR holds for all ϕ , which implies that (3.41) holds almost everywhere. ∈C0∞ d 1 d Finally, we prove that f is differentiable almost everywhere. Denote by S − the unit sphere in R and let V be a countable dense set in Sd 1. We have shown so far that there exists a set A Rd − ⊂ such that λ(Rd A)=0 and \ Dvf(a)= v f(a) ·∇ for all v V and a A (since V is countable). ∈ ∈ Fix a A. For v Sd 1 and t R 0 , define ∈ ∈ − ∈ \{ } f(a + tv) f(a) D(v,t) := − v f(a). t − ·∇ We have to show that D(v,t) 0 as t 0 independently of v. Let ε> 0. Since Sd 1, we can find → → − a finite set v ,..., v V { 1 n}⊂ such that for all v Sd 1, we have v v < ε for some i [n]. Since f is Lipschitz continuous, ∈ − | − i| ∈ there exists a constant C > 0 independent of v, such that

~~f(a + tv) f(a + tvi) D(v,t) D(vi,t) − + (v vi) f(a) C v vi < Cε. − ≤ t − ·∇ ≤ | − |~~

Since D (v ,t) 0 for t 0, we can find a δ > 0 such that i → → D(v ,t) < ε | i | for all t < δ and all i [n]. Hence, | | ∈ D(v,t) D(v,t) D(v ,t) + D(v ,t) Cε + ε = (C + 1)ε | |≤ − i | i |≤ for t < δ with (C + 1) independent of v, which implies the differentiability of f in a. Therefore, f | | is differentiable in A and thus differentiable almost everywhere. 3 Functions of multiple variables 90

~~We are now able to prove Stepanov’s Theorem, which he proved in [47, 48], although we present an adaptation of a proof due to Malý [39].~~

~~Theorem 3.6.8 (Stepanov). Let Ω Rn be open and let f :Ω R be a function. Then f is ⊂ → diﬀerentiable almost everywhere in the set~~

f(z) f(x) L(f) := x Ω: limsup − < . ∈ z x z x ∞ z→A ∈ | − | Proof. Let B ,B ,... be the countable collection of all balls in Ω with rational center and rational { 1 2 } radius such that f restricted to Bi is bounded. Clearly, this collection covers L(f). Deﬁne

~~u (x) := inf u(x): u is Lipschitz continuous with Lipschitz constant i and u f on B i ≥ i and~~

v (x) := sup v(x): v is Lipschitz continuous with Lipschitz constant i and v f on B . i ≤ i By Lemma 3.6.3, the functions u , v : B R are Lipschitz continuous with Lipschitz constant i i i → i and v f u on B . By Rademacher’s Theorem 3.6.7, u and v are diﬀerentiable almost i ≤ ≤ i i i i everywhere in Bi. In particular, the set

∞ Z := x Bi : ui or vi is not differentiable at x i=1 ∈ [ has measure zero. Let a L(f). Then there exists an M > 0 and a radius r> 0 such that ∈ f(x) f(a) M x a − ≤ | − | for all x B(a, r), the ball with center a and radius r. Clearly, there exists an i > M with ∈ a B B(a, r). For x B , we have ∈ i ⊂ ∈ i f(a) i x a v (x) u (x) f(a)+ i x a , − | − |≤ i ≤ i ≤ | − | and in particular, vi(a)= f(a)= ui(a). Since the functions u and v coincide on B L(f) with f and are differentiable on B Z, f is i i i ∩ i\ differentiable on ∞ B Z L(f) Z, i \ ⊃ \ i[=1 and thus almost everywhere in L(f).

We remark that Stepanov’s Theorem immediately extends to Rm-valued functions, since an Rm- valued functions is differentiable at a point if and only if all m component functions are differentiable at that point. In order to prove that a function f is differentiable almost everywhere, it is thus sufficient to ∈ A show that the set L(f) from Stepanov’s Theorem 3.6.8 has full measure. We start our proof with some technical lemmas, first of all with Vitali’s Covering Lemma. Notice that this lemma shares its name with Lemma 2.4.4, although the assertions are quite different. We denote by B(x, r) the (open) ball with center x and radius r. 3 Functions of multiple variables 91

Lemma 3.6.9 (Vitali’s Covering Lemma). Let B : j J be a collection of (non-degenerate) balls { j ∈ } in Rd that are contained in a bounded set. Then there exists a countable subcollection B : j J { j ∈ ′} with J J such that the balls in this subcollection are pairwise disjoint and satisfy ′ ⊂

~~Bj 5Bj, ⊂ ′ j J j J [∈ [∈ where 5Bj denotes the ball with the same center as Bj but 5 times its radius.~~

Proof. Write Bj = B(xj, rj) and suppose that all the balls are contained in B(0, R). We deﬁne the subcollection Bjn : n N inductively. Let R1 := supj J rj. Then 0 < R1 R. Choose j1 J { ∈ } ∈ ≤ ∈ such that r R/2. j1 ≥ Let J J be deﬁned by 1 ⊂ J = j J : B B = . ∈ j ∩ j1 ∅ Then, for j J J we have B B = , and hence ∈ \ 1 j ∩ j1 6 ∅ B B(x , 2r + r ). j ⊂ j1 j j1 Since r R /2 and r R , we have j1 ≥ 1 j ≤ 1 2r + r 2R + r 5r , j j1 ≤ 1 j1 ≤ j1 and thus, B B(x , 5r ). j ⊂ j1 j1

Next, let R2 := supj J1 rj. Choose j2 J1 such that rj2 R2/2. Deﬁne ∈ ∈ ≥ J = j J : B B = . 2 ∈ 1 j ∩ j2 ∅ As before, B B(x , 5r ) j ⊂ j2 j2 if j J J . ∈ 1\ 2 Continue this process inductively as long as J = . The resulting set B ,B ,... satisﬁes the n 6 ∅ j1 j2 requirements of the lemma.

~~We have the following corollary.~~

Lemma 3.6.10. Let Uj : j J be a collection of Lebesgue-measurable sets that are contained in {d ∈ } a bounded subset of R . Assume that there are constants C>c> 0 such that for every j J, Uj 1/d 1/d∈ contains a ball with radius cλ(Uj) and Uj is contained in a ball with radius Cλ(Uj) . Then there exists a ﬁnite pairwise disjoint subcollection U : j J with J J such that { j ∈ ′} ′ ⊂ cd λ Uj d d λ∗ Uj , ′ ≥ 2 5 C j J j J [∈ · [∈ where λ∗ is the outer Lebesgue measure.

d Proof. The volume of a ball Bj with radius rj is proportional to rj . Let Bj be a ball of radius 1/d 1/d Cλ(Uj) containing Uj and let B˜j be a ball of radius cλ(Uj ) contained in Uj. Apply Vi- tali’s covering lemma 3.6.9 to the collection B : j J and denote the resulting subcolletion by { j ∈ } 3 Functions of multiple variables 92

B : j J , where J is a countable subset of J. Since this subcollection is pairwise disjoint, also { j ∈ ′} ′ the corresponding sets U : j J are pairwise disjoint. Therefore, { j ∈ ′} d d ˜ c c λ Uj = λ(Uj) λ(Bj)= d λ(Bj)= d d λ(5Bj ) ′ ′ ≥ ′ ′ C ′ 5 C j J j J j J j J j J [∈ X∈ X∈ X∈ X∈ cd cd d d λ 5Bj d d λ∗ Uj . ≥ 5 C ′ ≥ 5 C j J j J [∈ [∈ If J is finite, the proof is finished. Assume that J is infinite and write J = j , j ,... . Let A be ′ ′ ′ { 1 2 } a bounded Lebesgue-measurable set containing U : j J . Then { j ∈ } ∞ λ(Ujn )= λ(Uj)= λ Uj λ∗ Uj λ(A) < . ′ ′ ≤ ≤ ∞ n=1 j J j J j J X X∈ [∈ [∈ In particular, there exists an N N such that ∈ d ∞ c λ(U ) λ∗ U . jn ≤ 2 5dCd j n=N+1 j J X · [∈ For such an N, we have

N ∞ ∞ ∞ λ Ujn = λ Ujn λ Ujn = λ Uj λ(Ujn ) − ′ − n=1 n=1 n=N+1 j J n=N+1 [ [ [ [∈ X cd cd cd λ∗ U λ∗ U = λ∗ U . ≥ 5dCd j − 2 5dCd j 2 5dCd j j J j J j J [∈ · [∈ · [∈ This proves the lemma.

We say that x z for x, z Rd, if x z and x = z, i.e. there exists at least one coordinate at ∈ ≤ 6 which z is larger than x. Lemma 3.6.11. Let E Rd be such that λ (E) > 0. With each point z E, we associate a point ⊂ ∗ ∈ z′ z. Then there exists a constant A> 0 depending on E but not on the points z′, such that there exists a ﬁnite number of points z ,..., z E with 1 n ∈ z z′ z z z′ (3.42) 1 1 2 · · · n n and n zk′ zk > A. (3.43) k=1 − X

Proof. We can assume that E is bounded, since unbounded sets of positive outer measure have bounded subsets of positive outer measure. Next, we can assume that z z < 1 for all z E, as ′ − ∈ otherwise, we take min 1, A instead of A at the end. { } We construct a parallelotope Pz around each point z E as follows. The parallelotope has a unique ∈ vertex Az that is the smallest with respect to the order . We define Pz by defining its center as well as the vertices adjacent to Az. The center of Pz is z. Next, there are d vertices of Pz adjacent 1 d to Az. We label those vertices Az,...,Az and we define them as 1 Az := Az + 10dk(z)e, i i A := Az + 2k(z)e , i 2, 3,...,d . z ∈ { } 3 Functions of multiple variables 93

~~Here, ei is the i-th unit vector, k(z) := max (z )i zi and e := 1, 1,..., 1 . i ′ − It is apparent that the set of all those parallelotopes cover s E. By Lemma 3.6.10, we can ﬁnd a~~

~~ﬁnite pairwise disjoint set S = Pz1 ,...,Pzm of those parallelotopes that satisﬁes~~

m λ Pz > cλ Pz cλ∗(E) k ≥ k=1 z E [ [∈ for some constant c> 0 only depending on the dimension d. Consider the set of lines ℓ in Rd with direction e. Each such line intersects a ﬁnite number of parallelotopes in S. The intersections are always one-dimensional intervals, and those intervals are pairwise disjoint, since the parallelotopes in S are pairwise disjoint. Furthermore, the length of the interval that is the intersection of Pz with such a line is 10dk(z), at least if it is non-empty. To every such line ℓ, we associate a number Bℓ that is the sum of the lengths of the intersections of ℓ with the parallelotopes in S. We claim that there is a constant B > 0 independent of the points z′, such that there always exists a line ℓ with Bℓ >B.

Indeed, since E is bounded, there exists a bounded parallelotope P (independent of all the z′) homothetic to the parallelotopes Pz such that E P , and even Pz P for all z E, since we ⊂ ⊂ ∈ assumed z z and thus k(z) to be uniformly bounded from above. Let D be the area of the ′ − projection of P to the last d 1 coordinates and assume that for some admissible choice of the − points z′ that cλ∗(E) sup Bℓ , ℓ ≤ D where the supremum is taken over all lines ℓ with direction e. Then

~~m cλ∗(E) λ Pzk D sup Bℓ D cλ∗(E), ≤ ℓ ≤ D ≤ k[=1 which is a contradiction. Hence, there is always a line ℓ such that~~

~~cλ (E) B > ∗ . ℓ D~~

Take such a line ℓ and let Pz1 ,...,Pzn be the parallelotopes in S that intersect ℓ, ordered with the usual order applied to the intersections. We show that the points z1,..., zn satisfy the requirements of the lemma, with a suitable choice of A> 0.

~~By deﬁnition, zi zi′. Furthermore,~~

~~z′ z + k(z )e z . i ≤ i i i+1~~

The ﬁrst inequality is clear. The last inequality holds, as the parallelotopes Pz are really stretched out in the direction e. Thus, the points z1,..., zn satisfy (3.42). Finally, since n Bℓ = 10dk(zj ), jX=1 we have n n k(zj ) Bℓ cλ∗(E) z′ z = > . j j √ 3/2 3/2 j=1 − ≥ j=1 d 10d 10Dd X X This proves the lemma. 3 Functions of multiple variables 94

Lemma 3.6.12. Let f : [a, b] R be coordinate-wise increasing. Then → f(z + h) f(z) lim sup − h 0 h → | | is finite for almost all z [a, b]. ∈ d d Proof. The space R can be separated into 2 quadrants. Let E , E ,...,E d be the sets of z [a, b] 1 2 2 ∈ where f(z + h) f(z) lim sup − h 0 h → | | is infinite when h comes from the corresponding quadrant. It is sufficient to show that E1,...,E2d have zero Lebesgue-measure. First, assume that the set E corresponding to h 0 has positive outer measure. For K > 0, we 1 ≥ can find a point z for every point z E with z z and ′ ∈ 1 ′

f(z′) f(z) K z′ z . − ≥ | − | Applying Lemma 3.6.11, we have a sequence z z z z with 1 1′ 2 · · · n′ n n AK K zj′ zj f(zj′ ) f(zj) f(zn′ ) f(z1) f(b) f(a), ≤ j=1 | − |≤ j=1 − ≤ − ≤ − X X since f is coordinate-wise increasing. For large enough K, this is a contradiction. Thus, E1 must have zero outer measure. It can be argued similarly that the set E corresponding to h 0 has zero outer measure. 2 ≤ Finally, consider the set E corresponding to the quadrant Q with hu 0 and h u 0. For such k ≥ − ≤ a h, denote h¯ := h1 , h2 ,..., hd . Since f is coordinate-wise increasing, | | | | | | f(z h¯) f(z + h) f(z + h¯). − ≤ ≤ Hence, f(z + h) f(z) max f(z + h¯) f(z) , f(z h¯) f(z) . − ≤ − − − n o Therefore,

f(z + h) f(z) f(z + h¯) f(z) f(z h¯) f(z) lim sup − lim sup max − , − − h 0 h ≤ h 0 ( h¯ h¯ ) h→Q | | h→Q | | |− | ∈ ∈ f(z + h¯) f(z) f(z h¯) f(z) lim sup − + limsup − − ≤ h 0 h¯ h 0 h¯ h→Q | | h→Q |− | ∈ ∈ f(z + h) f(z) f(z + h) f(z) = limsup − + limsup − . h 0 h h 0 h h→0 | | h→0 | | ≥ ≤ In particular, λ (E ) λ (E )+ λ (E ) = 0. This proves the lemma. ∗ k ≤ ∗ 1 ∗ 2 Theorem 3.6.13. Functions of bounded Arzelà-variation are diﬀerentiable almost everywhere. 3 Functions of multiple variables 95

Proof. Let g : [a, b] R be a coordinate-wise increasing function. Lemma 3.6.12 implies that → [a, b] L(g) has zero outer measure. Then Stepanov’s Theorem 3.6.8 implies that g is differentiable \ almost everywhere in L(g) (a, b), and thus also in [a, b]. ∩ + Let f : [a, b] R be a function of bounded Arzelà-variation and let (f ,f −) be its Jordan → + decomposition into coordinate-wise increasing functions. Since f and f − are differentiable almost everywhere, also f is differentiable almost everywhere.

The left- and right-hand limits of functions of bounded total variation always exist by Theorem 2.4.2. In the d-dimensional setting, we have 2d quadrants from which a sequence can converge. The limits of functions of bounded Arzelà-variation do not need to exist for all quadrants, as the following example illustrates.

Example 3.6.14. Let D = (x, 1 x): x [0, 1] be the decreasing diagonal of [0, 1]2 and let − ∈ A D. Then it is easy to see that the function 1 is of bounded Arzelà-variation. However, if (x ) ⊂ A n is a convergent sequence in D, the limit limn 1A(xn) need not exist. →∞ Nevertheless, the limits exist if the sequence converges from strictly below or strictly above.

Deﬁnition 3.6.15. We say that a sequence (x ) in converges from strictly below to x , if n I ∈ I (x ) converges to x and x < x for all n N. Sequences that converge from strictly above are n n ∈ deﬁned similarly.

~~Proposition 3.6.16. Let f : R be of bounded Arzelà-variation. Let (x ) and (z ) be sequences I → n n in converging from strictly below to x . Then I ∈I~~

~~lim f(xn) = lim f(zn) n n →∞ →∞ and in particular, both limits exist. A similar statement holds for sequences converging from strictly above.~~

Proof. We prove the statement for sequences converging from strictly below, the proof for sequences converging from strictly above is similar. First, let (xn) be a sequence converging from strictly below to x. We show that the limit limn f(xn) exists. →∞ Assume that the limit does not exist. We construct a subsequence of (xn) that we use to show that f is not of bounded Arzelà-variation. Since f is of bounded Arzelà-variation, it is in particular bounded by Lemma 3.10.6. Hence, the sequence f(xn) has at least one limit point. Since it does u v not converge, it has at least two diﬀerent limit points, whic h we call f1 and f2. Let ( n) and ( n) be subsequences of (x ) with f(u ) f , f(v ) f and n n → 1 n → 2 f f f f f(u ) f | 1 − 2| and f(v ) f | 1 − 2|. (3.44) n − 1 ≤ 3 n − 2 ≤ 3

We construct a subsequence (wn) of (xn) as follows. We always take wk to be an element of (un) if k is odd and an element of (vn) if k is even. Choose w1 = u1. If we have already chosen wk and, say, k is odd, then we choose w as an element v (v ) such that w < v . Such an element k+1 j ∈ n k j exists, as the entire sequence (xn) (and thus also the subsequences (un) and (vn)) converges from strictly below to x. We therefore get a strictly increasing sequence (wn) that alternates between the sequences (un) and (vn).

~~Next, we use the sequence (wn) to show that f is of unbounded Arzelà-variation. Deﬁne the diagonal~~

~~:= a, w , w ,..., w . Dn 1 2 n 3 Functions of multiple variables 96~~

~~This is a diagonal since the sequence (wn) is increasing. Furthermore,~~

~~n 1 − f(y ) f(y) f(w ) f(w ) . + − ≥ i+1 − i y n i=1 X∈D X~~

By the construction of (w ) we know that for all i N, one of w and w is in (u ) and the n ∈ i i+1 n other one in (v ). Assume without loss of generality that w (u ) and w (v ). Then by the n i ∈ n i+1 ∈ n triangle inequality and by (3.44),

~~f(w ) f(w ) = f(w ) f + f f + f f(w ) i+1 − i i+1 − 2 2 − 1 1 − i f1 f2 f f f(w ) f f(w ) f | − |. ≥ | 2 − 1|− i+1 − 2 − i − 1 ≥ 3~~

Therefore, n 1 − f f f f f(y ) f(y) | 1 − 2| = | 1 − 2|(n 1). + − ≥ 3 3 − y n i=1 X∈D X Since f = f , we have shown that 1 6 2 f1 f2 VarA(f) sup f(y+) f(y) sup | − |(n 1) = , ≥ n N − ≥ n N 3 − ∞ y n ∈ X∈D ∈ contradicting that f is of bounded Arzelà-variation. Hence, the limit limn f(xn) exists. →∞ If (zn) is another sequence converging from strictly below to x, we already know that limn f(zn) →∞ exists. If this limit does not coincide with limn f(xn), then the sequence (s1, s2, s3, s4,... ) := →∞ (x1, z1, x2, z2,... ) is such that f(sn) has two diﬀerent limit points. However, since (sn) is a x sequence converging from strictly below to , we have already shown that this is impossible. Thus, the limits coincide and we have proved the proposition.

Functions of bounded Hardy-Krause-variation exhibit greater regularity properties with respect to one-sided limits than function of bounded Arzelà-variation. The following theorem was proved by Blümlinger in [9].

Theorem 3.6.17. Let f : R be of bounded Hardy-Krause variation and let (xn) be a sequence I → d converging to x from one of the 2 quadrants induced by x. Then limn f(xn) exists. ∈I →∞ We have shown in Corollary 3.6.2 that functions in are Lebesgue-measurable. Aistleitner et al. H showed in [4] that functions in are Borel-measurable, and Blümlinger and Tichy showed in [10] HK that they are Riemann-integrable. Theorem 3.6.18. Functions in are Borel-measurable and Riemann-integrable. HK

~~3.7 Signed Borel measures~~

We have seen in Theorem 2.5.5 that there is a natural correspondence between right-continuous functions of bounded total variation and signed Borel-measures in one dimension. Such a connection also exists in higher dimensions, as was shown by Aistleitner and Dick in [2]. Theorem 3.7.1. Let f : [0, 1]d R be a coordinate-wise right-continuous function in . Then → HK there exists a unique signed Borel measure µ on [0, 1]d, such that

~~f(x)= µ([0, x]), x [0, 1]d. (3.45) ∈ 3 Functions of multiple variables 97~~

Furthermore, we have Var(µ) = Var 0(f)+ f(0) . (3.46) HK | | + + If (f ,f −) is the Jordan decomposition of f and (µ ,µ−) is the Jordan decomposition of µ, then + + d f (x)= µ ([0, x] 0 ) and f −(x)= µ−([0, x] 0 ), x [0, 1] . (3.47) \{ } \{ } ∈ Similarly, let µ be a ﬁnite signed Borel measure on [0, 1]d. Then there exists a unique coordinate-wise right-continuous function f on [0, 1]d that satisﬁes (3.45) and (3.46). If we again consider ∈ HK the corresponding Jordan decompositions, then (3.47) holds.

~~3.8 Dimension of the graph~~

We have seen that the graph of one-dimensional functions of bounded variation has Hausdorff and box dimension 1. For d-dimensional functions, it is natural to expect Hausdorff and box dimension d. The appropriate variation to use for the proof of this statement is the Hahn-variation, as it is defined using the oscillations of a function, which we already considered in the one-dimensional case. We note that this statement was proved recently by Verma and Viswanathan in [49], however, only for bivariate continuous functions. Theorem 3.8.1. Let f : [0, 1]d R be of bounded Hahn-variation. Then → dimH (graph(f)) = dimB(graph(f)) = d.

Recall that the δ-mesh of Rn is the collection of the cubes m1δ, (m1 + 1)δ mnδ, (mn + 1)δ , ×···× where the mi are integers. Lemma 3.8.2. Let f : [0, 1]d R be a function, let 0 <δ< 1 and let m be the smallest integer 1 → greater or equal to δ− . Let Nδ be the number of squares of the δ-mesh that intersect graph(f). Then d 1 N 2m + δ− osc(f; iδ, (i + 1)δ)). δ ≤ i 0,...,m 1 d ∈{ X − }

d Proof. Consider the square [iδ, (i + 1)δ] where i = (i1,...,id) 0,...,m 1 . Then obviously, 1 ∈ { − } there are at most δ− osc(f; iδ, (i + 1)δ) + 2 squares of the δ-mesh needed to cover the graph of f lying above [iδ, (i + 1)δ]. By summing over all the squares of the form [iδ, (i + 1)δ], we get the statement of the lemma.

Proof of Theorem 3.8.1. We have shown in Lemma 2.6.8 that for every set F Rn, we have ⊂ dim (F ) dim (F ) dim (F ). H ≤ B ≤ B Furthermore, by Lemma 2.6.5 we have dim (graph(f)) dim ([0, 1]d)= d. H ≥ H It remains to show that dim (graph(f)) d. Let 0 <δ< 1 and let m be the smallest integer B ≤ greater or equal to δ 1. Let i = (i1,...,id) 0,...,m . We have to consider osc(f; iδ, (i + 1)δ) − ∈ { } and relate it to Var (f). Notice that [iδ, (i + 1)δ] lies in at most 3d cells of the ladder and also H Em the other way around. The rectangles ( ) fulﬁl the path property (3.30), since they are closed. R Em Thus, we can apply Lemma 3.1.15 and get

~~d (d+1) osc(f; iδ, (i + 1)δ) 3 δ− Var (f). ≤ H Xi 3 Functions of multiple variables 98~~

~~Lemma 3.8.2 yields~~

d d 1 (d+1) log Nδ log(2m + 3 δ− δ− VarH (f)) dimB(f) = limsup lim sup δ 0 log δ ≤ δ 0 log δ → − → − log((3 + 3d Var (f))δ d) lim sup H − = d, ≤ δ 0 log δ → − which proves the theorem.

~~3.9 Product functions~~

We investigate how the different definitions of bounded variations behave for functions with product structure, i.e. functions that can be written as the product of lower-dimensional functions. We note that some of the main results of this chapter were already stated (at times in a weaker form) for dimension d = 2 by Adams and Clarkson in [1]. The statements were not proved in the paper, yet they gave the author of this thesis a strong hint to what is possible. In particular, we show that under very weak conditions, functions that can be written as a product of one-dimensional functions are in one of the spaces of functions of bounded variation if and only if they are in all the other spaces. Definition 3.9.1. A function f : R is a product function if there exists a partition of [d] into I → u1 uk non-empty sets u1,...,uk such that there exist functions f ,...,f with

k f(x)= f ui (xui ) iY=1 for all x , where no function f ui is identically 0. We also use the somewhat conflicting notation ∈I f ui (x) = f ui (xui ). Furthermore, f is called a total product function if k = d. For total product functions f, we always write d f(x)= f i(xi). iY=1 Notice that the term product function is redundant. Every non-zero function is a product function with the trivial partition u = [d]. Thus, the following statements are only useful for k 2. 1 ≥ Nevertheless, they are correct also for k = 1, so this case is included. For this section, the following alternative definition of the operator ∆[d] will be useful. Lemma 3.9.2. The one-dimensional difference operators commute, i.e. ∆i∆j =∆j∆i for i, j [d], ∈ where the product of difference operators should be interpreted as the composition. For every set v [d], with v = i ,...,i , ⊂ { 1 k} ∆v =∆ik ∆i1 . (3.48) · · · More generally, if v1,...,vk is a partition of v, then

∆v =∆v1 ∆vk . (3.49) · · · Proof. The commutativity is trivial for i = j. If i = j, then 6 ∆i(∆j(f))(x)=∆j(f)(x + hi) ∆i(f)(x)= f(x + hi + hj) f(x + hi) (f(x + hj) f(x)) − − − − = f(x + hi + hj) f(x + hj) (f(x + hi) f(x))=∆i(f)(x + hj) ∆i(f)(x) − − − − =∆j(∆i(f))(x), 3 Functions of multiple variables 99 where hi and hj are positive increments in the i and j-direction, respectively.

Next, let v [d] with v = i1,...,ik . We prove (3.48) by induction on k. For k = 1, the statement i1 i1⊂ { } ∆{ } =∆ is clear by definition. Suppose (3.48) has already been shown for k 1. Let v [d], v = i ,...,i , and define w := − ⊂ { 1 k} i1,...,ik 1 . Furthermore, let f : R be a function, let x and let h 0 be an arbitrary { − } I → ∈ I ≥ increment such that x + h . Then ∈I v u u v u v ∆ f(x)= ( 1)| |f(x : (x + h) − : x− ) u v − X⊂ u u (v u) v u u i w u v = ( 1)| |f(x : (x + h) − : x− ) ( 1)| |f(x ∪{ k} : (x + h) − : x− ) u w − − u w − X⊂ X⊂ u u (w u) v i = ( 1)| |f(x : (x + h) − : x− : (x + h){ k}) u w − X⊂ u u w u ( v) i ( 1)| |f(x : (x + h) − : x − ∪{ k }) − u w − X⊂ w w v i w w ( v) i =∆ (f)(x : x− : (x + h){ k}) ∆ (f)(x : x − ∪{ k }) − w i i w i i =∆ (f)((x + h){ k} : x− k ) ∆ (f)(x{ k} : x− k ) − i w i i i w =∆ k (∆ (f))(x k : x− k )=∆ k ∆ f(x) =∆ik ∆ik−1 ∆i1 (f)(x). · · · Therefore, ∆v =∆ik ∆i1 , · · · which proves (3.48). The equality (3.49) now follows easily from (3.48) and the commutativity of the one-dimensional difference operators.

Lemma 3.9.3. Let f ,...,f : R be functions. Let u [d] and assume that there exists a 1 n I → ⊂ k [n] such that only f depends on xu. Then ∈ k n u u ∆ fj (x)= fj(x) ∆ (fk)(x). j=1 j [n] k Y ∈ Y\{ } Proof. Let h be a non-negative increment in the coordinates of u, i.e. hu 0u and h u = 0 u. ≥ − − Then f (x + h)= f (x) for all j [n] k . We prove the statement by induction on the size of u. j j ∈ \{ } For u = 1, say u = i , | | { } n n n ∆i f (x)= f (x + h) f (x)= f (x) f (x + h) f (x) f (x) j j − j j k − j k j=1 j=1 j=1 j [n] k j [n] k Y Y Y ∈ Y\{ } ∈ Y\{ } = f (x) f (x + h) f (x) = f (x) ∆i(f )(x). j k − k j k j [n] k j [n] k ∈ Y\{ } ∈ Y\{ } Assume that we have already proved the statement for u k. Let u [d] with u = k +1 and | | ≤ ⊂ | | u = v i with v = k. Then by Lemma 3.9.2, ∪ { } | | n n u i v i v ∆ fj (x)=∆ ∆ fj (x)=∆ fj(x) ∆ (fk)(x) j=1 j=1 j [n] k ! Y Y ∈ Y\{ } i v u = fj(x) ∆ ∆ (fk)(x)= fj(x) ∆ (fk)(x). j [n] k j [n] k ∈ Y\{ } ∈ Y\{ } 3 Functions of multiple variables 100

Here, we used the induction hypothesis once for v and once for i . This proves the Lemma. { } k ui Lemma 3.9.4. Let u1,...,uk be a partition of [d] into non-empty sets and let f = i=1 f be a corresponding product function. Let v [d] and deﬁne v := u v. Then ⊂ i i ∩ Q k ∆vf(x)= ∆vi f ui (xui ). iY=1 Proof. By Lemma 3.9.2 and a repeated application of Lemma 3.9.3,

k ∆vf(x)=∆vk ∆v1 f uj (x) · · · jY=1 k =∆vk ∆v2 f uj ∆v1 (f u1 ) (x) · · · ! jY=2 k 2 =∆vk ∆v3 f uj ∆vi (f ui ) (x) · · · ! jY=3 iY=1 k k = ∆vi (f ui ) (x)= ∆vi f ui (xui ). iY=1 iY=1

k ui Lemma 3.9.5. Let u1,...,uk be a partition of [d] into non-empty sets, let f = i=1 f and let = d i Y( ). Then Y i=1 Y ∈ I Q Q k [d] ui ui ui ui ∆ (f; y, y+) = ∆ (f ; y , y+ ) . y i=1 y ui X∈Y Y ∈YX

~~Proof. We prove this statement by induction on k. For k = 1 we have u = [d], = u1 and 1 Y Y obviously 1 [d] ui ∆ (f; y,y+) = ∆ (f; y,y+) . y i=1 y ui X∈Y Y ∈YX~~

Assume we have already proved the lemma for k 1 and we want to prove it for k. Then by Lemma − 3.9.4, k k [d] ui ui ui ui ui ui ui ui ∆ (f; y, y+) = ∆ (f ; y , y+ ) = ∆ (f ; y , y+ ) . y y i=1 y i=1 X∈Y X∈Y Y X∈Y Y

Using that = uk uk , we get Y Y− ×Y k k ui ui ui ui ui ui ui ui ∆ (f ; y , y+ ) = ∆ (f ; y , y+ ) − − u u y i=1 y uk uk y k k i=1 X∈Y Y X∈Y X∈Y Y k 1 − ui ui ui ui uk uk uk uk = ∆ (f ; y , y+ ) ∆ (f ; y , y+ ) . − − u u y uk uk i=1 ! y k k X Y X∈Y ∈Y 3 Functions of multiple variables 101

~~Using Lemma 3.9.4 yields~~

k 1 − ui ui ui ui uk uk uk uk ∆ (f ; y , y+ ) ∆ (f ; y , y+ ) − − u u y uk uk i=1 ! y k k X∈Y Y X∈Y k 1 − uk ui uk uk uk uk uk uk = ∆− f ; y− , y+− ∆ (f ; y , y+ ) . − − u u y uk uk i=1 ! y k k X Y X∈Y ∈Y

~~Applying the induction hypothesis gives~~

k 1 − uk ui uk uk uk uk uk uk ∆− f ; y− , y+− ∆ (f ; y , y+ ) − − u u y uk uk i=1 ! y k k X∈Y Y X∈Y k 1 − ui ui ui ui uk uk uk uk = ∆ (f ; y , y+ ) ∆ (f ; y , y+ ) i=1 yui ui ! yuk uk Y X∈Y X∈Y k ui ui ui ui = ∆ (f ; y , y+ ) . i=1 yui ui Y X∈Y

We have the following characterization of the Vitali-variation for product functions. To state the u i proposition, we deﬁne := i u for u [d]. I ∈ I ⊂ k ui Proposition 3.9.6. Let u1,...,uQ k be a partition of [d] into non-empty subsets and let f = i=1 f be a corresponding product function. Then Q k Var (f; )= Var (f ui ; ui ). V I V I iY=1 Proof. By Lemma 3.9.5,

~~k [d] ui ui ui ui VarV (f)= sup ∆ (f; y, y+) = sup ∆ (f (y , y+ ) . Y y Y i=1 yui ui Y∈ X∈Y Y∈ Y X∈Y~~

~~Due to the product structure of multi-dimensional ladders,~~

k ui ui ui ui sup ∆ (f (y , y+ ) Y i=1 yui ui Y∈ Y X∈Y k ui ui ui ui = sup sup sup ∆ (f (y , y+ ) . u1 u1 u2 u2 · · · uk uk Y( ) Y( ) Y( ) i=1 yui ui ! Y ∈ I Y ∈ I Y ∈ I Y X∈Y

~~Therefore,~~

k ui ui ui ui VarV (f)= sup sup sup ∆ (f (y , y+ ) u1 u1 u2 u2 · · · uk uk Y( ) Y( ) Y( ) i=1 yui ui ! Y ∈ I Y ∈ I Y ∈ I Y X∈Y k k ui ui ui ui ui ui = sup f (y+ ) f (y ) = VarV (f ; ). ui ui | − | I i=1 Y( ) yui ui i=1 Y Y ∈ I X∈Y Y 3 Functions of multiple variables 102

The above proposition might seem very promising for the Vitali-variation. However, it also illustrates its glaring weakness: If one of the functions f ui is constant, then f is already of bounded Vitali-variation. The Vitali-variation is thus “blind” for lower-dimensional functions. Therefore, it is often necessary to consider the Hardy-Krause-variation. To summarize, we have the following corollary.

~~Corollary 3.9.7. Let f : R be a function. If f is independent of one of the d coordinates, I → then VarV (f) = 0.~~

k ui Let u1,...,uk is a partition of [d] into non-empty subsets and f = i=1 f be a corresponding product function. If f is of bounded non-zero Vitali-variation, then also all the functions f ui are Q of bounded non-zero Vitali-variation. Conversely, if all the f ui are of bounded (non-zero) Vitali- variation, then f is of bounded (non-zero) Vitali-variation. d i Let f = i=1 f be a total product function. If f is of bounded non-zero Vitali-variation, then all the functions f i are of bounded non-zero total variation. Conversely, if all the f i are of bounded Q (non-zero) total variation, then f is of bounded (non-zero) Vitali-variation.

~~Proof. This follows immediately from Proposition 3.9.6 and Proposition 3.1.17.~~

This corollary enables us to prove the following multidimensional generalization of a theorem by Adams and Clarkson in [1] for dimension d = 2. This proposition is not necessary for our further studies, but it shows that the blindness for lower-dimensional functions is the only signiﬁcant diﬀerence between the Vitali- and the Hardy-Krause-variation.

Proposition 3.9.8. A function f : R is in if and only if it can be written as I → V d i i f(x)= f¯(x)+ g− (x− ) (3.50) Xi=1 for some function f¯ . ∈ HK Proof. If f has a representation as in (3.50), then f , since f¯ and since the g i are ∈V ∈HK⊂V − also in by Corollary 3.9.7. V Conversely, assume that f . We work with the Hardy-Krause-variation at 1, which is defined as ∈V u u u u Var 1(h; )= Var h(.− ; b ); a− , b− HK I V u([d] X for a function h. We want to inductively construct a function f¯ from f such that in every step of the induction, one of the lower-dimensional Vitali-variations in the above sum vanishes. To do so, we first need to order those lower-dimensional Vitali-variations. To order those Vitali-variations, it suffices to order the sets u with = u ( [d]. We assign a random ∅ 6 total order on those sets u such that smaller sets come first. So, if u and v are such that u < v , | | | | then u comes before v in the order. Notice that smaller sets u correspond to higher-dimensional faces over which the Vitali-variation is taken. Now that we have ordered the sets above, for the sake of a simpler notation, we assign to them the set [2d 2] in a natural (i.e. order-preserving) way. Furthermore, we define − u u u u Vari(h) := VarV h(.− ; b ); a− , b− if i and u correspond to each other. 3 Functions of multiple variables 103

Now we construct the function f¯ by induction over the set of all u with = u ( [d], where the ∅ 6 induction is done with respect to the order we have just introduced. In every step of the induction, we modify the function f by subtracting another lower-dimensional function. We call the function that results after the n-th step of the induction fn.

~~In the beginning, we have the function f0 = f and potentially no term of the sum~~

~~2d 2 − Vari(f0) Xi=1 is ﬁnite.~~

Assume we have already constructed the function fn by subtracting lower-dimensional functions from f such that n Vari(fn) = 0. Xi=1 Then let A be the face of over which the Vitali-variation is taken in Var . Let g be the restriction I n+1 of fn to the face A. Then g is a lower-dimensional function, since A does not have dimension d. We can extend g naturally to by leaving it independent of the remaining variables, and call this I extension again g. We define f = f g. Of course, Var (f ) = 0, but we also need to check n+1 n − n+1 n+1 Var (f )= 0 for i n. Since Var (f ) = 0 by the induction hypothesis, it suffices to show that i n+1 ≤ i n also Vari(g) = 0. If the face B corresponding to i has a higher dimension than A, then Vari(g) = 0 since g is certainly independent of at least one variable over which the Vitali-variation is taken. If B has the same dimension as A, then also Vari(g) = 0 since g is again independent of at least one variable over which the Vitali-variation is taken; otherwise we would have A = B. Finally, B cannot have a dimension smaller than A due to the definition of our order. We have thus finished our induction. d ¯ The induction of course stops at n = 2 2, as there are no more faces left. Define f = f2d 2. By − − construction, f¯ . ∈ HK We have shown in the induction that we can write

~~f = f¯ h − i Xi for some function f¯ and for lower-dimensional functions h . It is not hard to see that we can ∈ HK i write the sum over the hi as in (3.50).~~

d i i Theorem 3.9.9. Let f(x) = i=1 f (x ) be a total product function. Then we have the following equivalences. Q f f f 1,...,f d ∈HK ⇐⇒ ∈A ⇐⇒ ∈ BV Proof. First, if f , then by Theorem 3.5.1, f . ∈ HK ∈A Next, assume that f , but f i / for some i [d]. Let Y[ai, bi] be a ladder, such that ∈A ∈ BV ∈ Yn ∈ f i(y ) f i(y) n, | + − |≥ y n X∈Y j j j and write n = y0,y1,...,ykn , where yl 1 < yl for l [kn]. Let x [a , b ] be such that Y { } − ∈ ∈ f j(xj) = 0. Deﬁne the diagonal 6 1 i 1 i+1 d 1 i 1 i+1 d := (a, (x ,...,x − ,y ,x ,...,x ),..., (x ,...,x − ,y ,x ,...,x ), Dn 0 kn 1 i 1 i i+1 d (x ,...,x − , b ,x ,...,x )) 3 Functions of multiple variables 104 on . Then I

VarA(f)= sup f(y+) f(y) f(y+) f(y) D | − |≥ | − | y y n D∈ X∈D X∈D 1 i 1 i+1 d j j i i = f(x ,...,x − ,y ,x ,...,x ) f(a) + f (x ) f (y ) f (y) | 0 − | | + − | y n j i X∈Y Y∈− 1 i 1 i i+1 d + f(b) f(x ,...,x − , b ,x ,...,x ) | − | f j(xj) n. ≥ j i | | Y∈− Taking n to inﬁnity yields an unbounded Arzelà-variation, giving a contradiction to our assumption. Thus, f implies that all f i are of bounded total variation. ∈A Finally, assume that all the functions f i are of bounded total variation. We have to show that f . This is an immediate consequence of Proposition 3.3.1 and Proposition 3.9.6 with ∈ HK d u u u u i u u u u VarHK1(f)= VarV (f(.− ; b ); a− , b− )= VarV f (.− ; b ); a− , b− uX([d] uX([d] iY=1

~~i i i u u = f (b ) VarV f ; a− , b− u([d] i u i u ! X Y∈ ∈−Y~~

~~i i i i i = f (b ) VarV f ; a , b u([d] i u i u ! X Y∈ ∈−Y~~

~~= f i(bi) Var f i < ∞ u([d] i u i u ! X Y∈ ∈−Y~~

~~Alternatively, f would also follow immediately from the fact that is a Banach algebra. ∈ HK HK However, the above equations give us an alternative way of computing the Hardy-Krause-variation.~~

d i i Corollary 3.9.10. Let f = i=1 f be a total product function and let all the f be non-constant. Then Q f f f f 1,...,f d . ∈V ⇐⇒ ∈HK ⇐⇒ ∈A ⇐⇒ ∈ BV Proof. By Lemma 2.7.1, the f i are non-constant if and only if their total variation is non-zero. By Corollary 3.9.7, f is of bounded non-zero Vitali-variation if and only if the f i are of bounded non-zero total variation. Thus,

~~f f 1,...,f d . ∈V ⇐⇒ ∈ BV The remaining implications follow from Theorem 3.9.9.~~

~~The study of functions of bounded Hahn-variation proves to be more technical. We start with a simple lemma that illustrates our approach.~~

Lemma 3.9.11. Let = v ( [d] and let f(x) = f v(xv) be a function that is independent of x v. ∅ 6 − Then Var (f v; v) = Var (f; ). H I H I In particular, f ( ) if and only if f v ( v). ∈H I ∈H I 3 Functions of multiple variables 105

Proof. Let n N and let be the equidistant ladder on . We denote by v and v the equidistant ∈ En I En En− ladders on v and v, respectively. Since ( v) = n v = nd v , I I− |R En− | |− | −| | v v v oscRv (f ) 1 oscRv (f ) oscRv (f ) v 1 = d v v 1 = d 1 v v n| |− − −v n −| | v v n| |− − −v n − R ( n) R v ( ) R ( n) R v ( ),Rv ( v ) ∈RXE ∈RXEn ∈RXE ∈R EnX ∈R En v v v v Since every R ( n) corresponds to exactly one pair (R , R− ) ( n), ( n− ) and vice-versa, ∈ Rv E v ∈ R E R E and since osc v (f ) = osc (f) for R R, we have R R ⊂ v oscRv (f ) oscR(f) v 1 = d 1 . v v n n − R ( ) | |− R ( n) ∈RXEn ∈RXE Taking the supremum over all n N, we arrive at the conclusion Var (f v; v) = Var (f; ). ∈ H I H I While this lemma might seem rather trivial, notice that a similar statement does not hold for the Vitali-variation. We want to generalize this lemma. Realize that the preceding lemma could be v v v interpreted as having a product function f = f f − , where the function f − was constantly 1. v We weaken this assumption by only requiring f − to be bounded away from 0 uniformly on most rectangles.

~~Deﬁnition 3.9.12. Let f : R be a function on the d-dimensional rectangle . Let ε> 0, n N I → I ∈ and deﬁne~~

n := R ( n) : sup f(x) ε . S ∈ R E x R | |≥ n ∈ o The function f is called strongly non-vanishing (with parameters ε and c on ), if there exists a I constant c> 0 such that for all n N, cnd, ∈ |Sn|≥ The condition of being strongly non-vanishing is very technical. We give a necessary and a suﬃcient condition. We say that a function does not vanish almost everywhere, if there exists a set of positive Lebesgue-measure on which the function is non-zero.

Proposition 3.9.13. Functions that do not vanish almost everywhere are strongly non-vanishing. In particular, measurable functions that do not equal zero almost everywhere are strongly non- vanishing. The support of a d-dimensional function that is strongly non-vanishing has box-dimension d.

Remark 3.9.14. It is not suﬃcient to ask for the support of f to have positive Lebesgue-measure. As a counterexample, let (qn) be an enumeration of the rational numbers in [0, 1] and deﬁne the function f : [0, 1] R as f(q ) = 1/n and f(x) = 0 otherwise. It is easy to check that f is not → n strongly non-vanishing but λ(supp(f)) = 1. Moreover, the indicator function of the rationals is zero almost everywhere, yet it is strongly non- vanishing.

Proof. First, let f : R be a function that does not vanish almost everywhere and let A be a I → Lebesgue-measurable set of positive Lebesgue-measure on which f is non-zero. For m N, deﬁne ∈ A := x A : f(x) > 1/m . m ∈ | | n o Obviously, ∞ A = Am. n[=1 3 Functions of multiple variables 106

Since λ(A) > 0, already λ(A ) > 0 for some m N. Take such an m and for n N, define m ∈ ∈ := R ( ) : A R = . Tn ∈ R En m ∩ 6 ∅ n o Clearly, and Tn ⊂ Sn A = A R m m ∩ R n [∈T for all n N. The set (E ) contains nd rectangles of equal Lebesgue-measure. In fact, since ∈ R n those rectangles are pairwise disjoint (except for sets of Lebesgue-measure 0) and they cover , I λ(R)= λ( )n d for all R ( ). Furthermore, I − ∈ R En d d λ( )n− = λ( )n− = λ(R) λ(A R)= λ A R = λ(A ) > 0. I |Tn| I ≥ m ∩ m ∩ m R n R n R n R n X∈T X∈T X∈T [∈T Therefore, λ(A ) m nd, |Sn|≥|Tn|≥ λ( ) I proving that f is strongly non-vanishing. Moreover, if f is measurable, then f does not vanish almost everywhere if and only if it is different from zero on a set of positive measure, proving the second claim. Conversely, let f be strongly non-vanishing. Then there exist constants c,ε > 0 such that cnd. |Sn|≥ For the sake of simplicity, we assume that = [0, 1]d, the argument for arbitrary rectangles is similar. I Notice that the rectangles of the δ-mesh on [0, 1]d for δ = 1/n coincide with the rectangles in ( ). R En Define A := x [0, 1]d : f(x) ε ∈ | |≥ and notice that n o = R ( ) : R A = . Sn ∈ R En ∩ 6 ∅ n o Therefore, N1/n(A) = n . Now, let δ (0, 1) be arbitrary. Let m be the smallest integer greater 1 |S | ∈ d or equal to δ− . Every rectangle of the δ-mesh is covered by at most 3 rectangles of the 1/m-mesh and vice versa. In particular, N (A) 3 dN (A). Then δ ≥ − 1/m d d d d log N (A) log(3− N1/m(A)) log(3 ) log(3 cm ) δ = − |Sm| − log δ ≥ log m log m ≥ log m − d log m d log 3 + log c log c d log 3 = − = d + − . log m log m Hence, log Nδ(A) log c d log 3 dimB(A) = lim inf lim inf d + − = d. δ 0 log δ ≥ m log m → − →∞ Since A supp(f) , ⊂ ⊂I d = dim (A) dimB supp(f) dim supp(f) dim ( )= d, B ≤ ≤ B ≤ B I which implies that dimB supp(f) =d. Lemma 3.9.15. Let = v ( [d] and let f(x) = f v(xv)f v(x v) be a corresponding product ∅ 6 − − function, where f v is strongly non-vanishing with parameters ε,c > 0 on v. Then − I− 1 Var (f v; v) Var (f). H I ≤ εc H In particular, if f ( ), then f v ( v). ∈H I ∈H I 3 Functions of multiple variables 107

~~Proof. Similarly to the proof of Lemma 3.9.11, we have~~

v v oscRv (f ) oscRv (f ) v 1 = d 1 . v v n| |− v v n − R ( n) R ( n) ∈RXE ∈RXE− R−v ( v ) ∈R En v Since n cn|− |, |S |≥ v v oscRv (f ) 1 oscRv (f ) d 1 d 1 . v v n − ≤ c v v n − R ( n) R ( n) −v∈RXE−v −∈RXv E R ( ) R n ∈R En ∈S v v v Since for all R− n, supx−v R−v f − (x− ) ε, ∈ S ∈ | |≥ v v 1 oscRv (f ) 1 v v oscRv (f ) sup f − (x− ) d 1 − − d 1 c v v n − ≤ εc v v x v R v | | n − R ( n) R ( n) ∈ −∈RXv E −∈RXv E R n R n ∈S ∈S Furthermore, for each R = R v Rv ( v) , − × ∈ R En × Sn v v v v v v v v v sup f − (x− ) oscRv (f )= sup f − (x− ) sup f (x ) f (y ) x−v R−v | | x−v R−v | | xv,yv Rv | − | ∈ ∈ ∈ v v v v v v v v = sup sup f (x )f − (x− ) f (y )f − (x− ) x−v R−v xv,yv Rv | − | ∈ ∈ v v v v = sup sup f(x : x− ) f(y : x− ) x−v R−v xv,yv Rv | − | ∈ ∈ v v v v sup sup f(x : x− ) f(y : y− ) ≤ x−v,y−v R−v xv,yv Rv | − | ∈ ∈ = sup f(x) f(y) = oscR(f) x,y R | − | ∈ Altogether, we have shown

v v oscRv (f ) 1 v v oscRv (f ) sup f − (x− ) v 1 −v −v d 1 v v n| |− ≤ εc v v x R | | n − R ( n) R ( n) ∈ ∈RXE −∈RXv E R n ∈S 1 oscR(f) 1 oscR(f) d 1 d 1 . ≤ εc v n − ≤ εc n − R (E ) n R ( n) ∈RXn ×S ∈RXE By taking the supremum over all n N, we get Var (f v; v) 1 Var (f). ∈ H I ≤ εc H The property of being strongly non-vanishing is closed under multiplication in the following sense.

k ui Lemma 3.9.16. Let u1,...,uk be a partition of [d] into non-empty subsets and let f = i=1 f be a corresponding product function. If f ui is strongly non-vanishing on ui with parameters εui I Q and cui for all i [k], then f is strongly non-vanishing on with parameters ε = k εui and ∈ I i=1 c = k cui . i=1 Q Q Proof. Deﬁne the sets

~~ui ui ui ui ui ui n := R ( n ): sup f (x ) ε S ∈ R E xui Rui | |≥ n ∈ o and the sets n := R ( n) : sup f(x) ε . S ∈ R E x R | |≥ n ∈ o 3 Functions of multiple variables 108~~

Let the functions f ui be strongly non-vanishing on ui with parameters εui and cui . Certainly, if I Rui ui for i [k], then R := k Rui , since R ( ) and ∈ Sn ∈ i=1 ∈ Sn ∈ R En Q k k k sup f(x) = sup sup f ui (xui ) = sup f ui (xui ) εui = ε. u u u u u u x R | | x 1 R 1 · · · x k R k i=1 i=1 x i R i ≥ i=1 ∈ ∈ ∈ Y Y ∈ Y Therefore, k k k ui ui d = c n| | = cn . |Sn|≥ Si |Si|≥ i=1 i=1 i=1 Y Y Y

We get the following statement for product functions in . H k ui Proposition 3.9.17. Let u1,...,uk be a partition of [d] into non-empty subsets and let f = i=1 f be a corresponding product function. Assume that the functions f ui are strongly non-vanishing on Q ui for i [k]. Then f ( ) if and only if f ui ( ui ) for all i [k]. I ∈ ∈H I ∈H I ∈ Proof. If f ui ( ui ), then by Lemma 3.9.11, f ui ( ). Since is closed under multiplication ∈H I ∈H I H by Proposition 3.3.3, also f ( ). ∈H I Conversely, assume that f ( ). For j [k], we can write f = f uj f uj . Since the functions f ui ∈H I ∈ − are strongly non-vanishing for i [k] j , also f uj is strongly non-vanishing by Lemma 3.9.16, ∈ \{ } − say with parameters c,ε > 0. Hence, Lemma 3.9.15 yields 1 Var (f uj ; uj ) Var (f; ) < . H I ≤ εc H I ∞ Thus, f uj ( uj ). ∈H I We get the following corollary. d i i Corollary 3.9.18. Let f = i=1 f be a total product function, such that the functions f are all non-constant and strongly non-vanishing. Then Q f f f f f 1,...,f d . ∈H ⇐⇒ ∈V ⇐⇒ ∈ HK ⇐⇒ ∈A ⇐⇒ ∈ BV Proof. This is an immediate consequence of Corollary 3.9.10, Proposition 3.9.17, and the fact that a one-dimensional function is of bounded Hahn-variation if and only if it is of bounded total variation.

~~3.10 Structure of the function spaces~~

In this section, we prove that the spaces , , and are commutative Banach algebras with A H P HK respect to pointwise multiplication, and we cite an analogue to Helly’s First Theorem 2.7.11 for . HK First, we consider . It was proved by Blümlinger and Tichy in [10] that is a commutative HK HK Banach algebra with respect to pointwise multiplication. Theorem 3.10.1. For f , we deﬁne ∈ HK f σ := f + σ VarHK1(f). k k k k∞ For σ > 0, . is a norm on . Furthermore, for σ > 3d 2d+1 + 1, the space is a k kσ HK − HK commutative Banach algebra. 3 Functions of multiple variables 109

Furthermore, we also have a version of Helly’s First Theorem 2.7.11 for in arbitrary dimensions. HK This was proved by Leonov in [37]. Theorem 3.10.2. If all the elements of an inﬁnite family of functions z satisfy the { γ } ⊂ HK condition z K for some ﬁxed constant K, then there exists a sequence of functions of this k γ k1 ≤ family that converges pointwise to a function in . HK Next, we show that is a commutative Banach algebra with respect to pointwise multiplication. A Theorem 3.10.3. The vector space together with the norm f := Var (f) + f(a) is a A k kA A | | commutative Banach algebra with respect to pointwise multiplication.

We split the proof in multiple steps and note that it is similar to the one-dimensional case of the bounded total variation. Lemma 3.10.4. A function f satisﬁes Var (f) = 0 if and only if it is constant. ∈A A

Proof. If f is constant, then clearly VarA(f) = 0. Conversely, if f is not constant, then there exists some x such that f(a) = f(x). Deﬁne the ∈ I 6 diagonal := (a, x). Then we have D0 Var (f) = sup f(y ) f(y) f(y ) f(y) = f(b) f(x) + f(x) f(a) > 0, A | + − |≥ | + − | | − | | − | D y y 0 D∈ X∈D X∈D proving the lemma. Lemma 3.10.5. The vector space together with . is a normed space. A k kA Proof. We know that is a vector space by Proposition 3.3.1. If f = 0, then obviously f = 0. A k kA Conversely, if f = 0, then necessarily f(a) = 0 and Var (f) = 0. Since Var (f) = 0 implies k kA A A that f is constant by Lemma 3.10.4, we conclude that f = 0. Finally, the homogeneity and the triangle inequality of . follow immediately from Proposition 3.3.1. k kA Lemma 3.10.6. The space is a subspace of , the Banach space of bounded functions equipped A B with the supremum norm f := supx f(x) . Furthermore, for all f we have f f A. k k∞ | | ∈A k k∞ ≤ k k In particular, convergence in implies uniform convergence, i.e. convergence in . A B Proof. Let f and let x . Deﬁne the diagonal := (a, x). Then ∈A ∈I D0 f(x) f(a) + f(b) f(x) + f(x) f(a) = f(a) + f(y ) f(y) | | ≤ | | | − | | − | | | | + − | y 0 X∈D f(a) + Var (f)= f . ≤ | | A k kA By taking the supremum over all x , we have f f A. This implies the lemma. ∈I k k∞ ≤ k k Lemma 3.10.7. The functional VarA is lower semi-continuous; if (fn) is a sequence of functions in that converges pointwise to f, then f is in and VarA(f) lim infn VarA(fn). A A ≤ →∞ Proof. Since f f pointwise, we have n → VarA(f)= sup f(y+) f(y) = sup lim fn(y+) fn(y) n D y | − | D y →∞ | − | D∈ X∈D D∈ X∈D = sup lim fn(y+) fn(y) lim inf sup fn(y+) fn(y) n n D →∞ y | − |≤ →∞ D y | − | D∈ X∈D D∈ X∈D = lim inf VarA(fn). n →∞ 3 Functions of multiple variables 110

Proof of Theorem 3.10.3. The submultiplicativity of the norm follows from Proposition 3.3.3. In- deed, if f, g , then ∈A fg = Var (fg)+ f(a)g(a) k kA A | | Var (f) Var (g)+ g(a) Var (f)+ f(a) Var (g)+ f(a)g(a) ≤ A A | | A | | A | | = Var (f)+ f(a) Var (g)+ g(a) A | | A | | = f g . k kAk kA Finally, the completeness of follows from Lemma 2.7.6 together with Lemma 3.10.6 and Lemma A 3.10.7.

Next, we show that is a commutative Banach algebra with respect to pointwise multiplication. H Theorem 3.10.8. The vector space together with the norm f H := VarH (f) + f is a H k k k k∞ commutative Banach algebra with respect to pointwise multiplication.

Lemma 3.10.9. The vector space together with . is a normed space. H k kH Proof. Lemma 3.3.2 implies that f < for all f . The positive-definiteness of . follows k kH ∞ ∈H k kH from the positive-definiteness of f and the fact that VarH (0) = 0. The homogeneity and the k k∞ triangle inequality follow from Proposition 3.3.1 and the respective properties of the norm . . k k∞ Lemma 3.10.10. Let (fn) be a sequence of bounded real-valued functions defined on Ω that converges uniformly to a function f. Then

~~oscΩ(f) lim inf oscΩ(fn). n ≤ →∞ Proof. Consider~~

~~oscΩ(f)= sup f(x) f(y) sup f(x) fn(x) + fn(x) fn(y) + fn(y) f(y) x,y Ω | − |≤ x,y Ω | − | | − | | − | ∈ ∈ f fn + oscΩ(fn)+ fn f . ≤ k − k∞ k − k∞ In particular,~~

~~oscΩ(f) lim inf 2 f fn + oscΩ(fn) = lim inf oscΩ(fn). ≤ n k − k∞ n →∞ →∞~~

Lemma 3.10.11. The functional VarH is lower semi-continuous in the following sense: if (fn) is a sequence of functions in that converges uniformly to f, then f is in and Var (f) H H H ≤ lim infn VarH (fn). →∞ Proof. By Lemma 3.10.10,

oscν(f) lim infm oscν(fm) →∞ VarH (f) = sup d 1 sup d 1 n N n − ≤ n N n − ν ( n) ν ( n) ∈ ∈RXE ∈ ∈RXE oscν(fm) lim inf sup d 1 = lim inf VarH (fm). ≤ m n N n − m →∞ ν ( n) →∞ ∈ ∈RXE 3 Functions of multiple variables 111

~~Proof of Theorem 3.10.8. The submultiplicativity of the norm follows from Proposition 3.3.3 and the submultiplicativity of the supremum norm with~~

fg H = VarH (fg)+ fg f VarH (g)+ g VarH (f)+ f g k k k k∞ ≤ k k∞ k k∞ k k∞k k∞ VarH (f) VarH (g)+ f VarH (g)+ g VarH (f)+ f g ≤ k k∞ k k∞ k k∞k k∞ = VarH (f)+ f VarH (g)+ g = f H g H k k∞ k k∞ k k k k The completeness of follows from Lemma 2.7.6 together with Lemma 3.10.11 and the obvious H inequality f f H for all f . k k∞ ≤ k k ∈H The fact that is a commutative Banach algebra with respect to pointwise multiplication is an P easy corollary of Theorem 3.10.8.

~~Corollary 3.10.12. The vector space together with the norm f p := VarP (f)+ f is a P k k k k∞ commutative Banach algebra with respect to pointwise multiplication.~~

Proof. The fact that . is a norm on follows analogously to Lemma 3.10.9. Furthermore, in k kp P Theorem 3.5.1 we have shown that = . Moreover, it is apparent from the proof of the theorem P H that there exist constants c,C > 0 such that

c Var (f) Var (f) C Var (f) H ≤ P ≤ H for all functions f = . Hence, the norms . and . are equivalent and is complete. ∈ H P k kP k kH P It remains to show the submultiplicativity of the norm . . This follows again form Proposition k kP 3.3.3 and the submultiplicativity of . with k k∞

fg P = VarP (fg)+ fg f VarP (g)+ g VarP (f)+ f g k k k k∞ ≤ k k∞ k k∞ k k∞k k∞ VarP (f) VarP (g)+ f VarP (g)+ g VarP (f)+ f g ≤ k k∞ k k∞ k k∞k k∞ = VarP (f)+ f VarP (g)+ g = f P g P k k∞ k k∞ k k k k

~~3.11 Ideal structure of the functions spaces~~

~~We study the maximal ideal space of the Banach algebras and . First, we show the following HK A characterization of proper ideals.~~

Lemma 3.11.1. Let be an ideal in , or . Then is proper if and only if there exists a J A HK H J point x such that for all neighbourhoods U of x and all functions f we have 0 ∈I 0 ∈J inf f(x) = 0. x U | | ∈ Proof. We prove the lemma for , the other cases follow analogously by replacing with or A A HK . H Let be proper and assume that for all x there exists a neighbourhood U(x), a function J ∈ I fx and a number δx > 0 such that ∈J

~~inf fx(y) δx. y U(x) | |≥ ∈ 3 Functions of multiple variables 112~~

Since is compact, there exists a ﬁnite set x ,..., x such that Ux ,...,Ux covers . In I { 1 n} { 1 n } I particular, δ := mini [n] δxi > 0, ∈ n 2 2 f(y) := fx (y) δ , i ≥ Xi=1 and f . Since f is uniformly bounded away from 0, also 1/f by Proposition 3.3.3. Hence, ∈J ∈A by Lemma 2.8.3, = and is not proper. J A J The converse implication is trivial, since the constant function x 1 is in but not in if there 7→ A J exists a point x with the above properties. 0 ∈I First, we study the maximal ideal space of . The following theorem is due to Blümlinger in [9]. HK Theorem 3.11.2. The maximal ideal space of ([a, b]) can be identiﬁed with the 2d +1 rectangles HK u u u u [a, b], [a , b ) (a− , b− ] × for u [d]. The maximal ideal corresponding to x [a, b] is ⊂ 0 ∈ (x ) := f : f(x ) = 0 , J 0 ∈ HK 0 and the maximal ideal corresponding to x [au, bu) (a u, b u] is 0 ∈ × − −

u(x0) := f : lim f(x0 + h) = 0 . J ∈ HK h 0 hu 0→,h−u 0 ≥ ≤ Proof. We only give the idea of the proof, since it is rather similar to the one-dimensional case. It is easy to see that the ideals (x ) are maximal. The ideals (x ) are maximal since the one-sided J 0 Ju 0 limits from a quadrant of functions in always exist by Theorem 3.6.17. HK Conversely, if is a maximal ideal, then by Lemma 3.11.1, there exists a point x such that J 0 ∈ I for every function f in we ﬁnd a sequence (x ) converging to x such that f(x ) converges to J n 0 n zero. By an argument similar to the proof of Theorem 2.8.7, one can show that is contained in J one of the intervals (x ) or (x ) for u [d]. J 0 Ju 0 ⊂ The characterization of the maximal ideal space of was a perfect generalization of the char- HK acterization of the maximal ideal space of . We cannot give a similar characterization for the BV maximal ideal space of as there are much more maximal ideals. However, we aim to describe A roughly what large ideals in look like. First, it is clear that the sets A f : f(x ) = 0 ∈A 0 with x are maximal ideals in . We also have the following maximal ideals similar to those in 0 ∈I A Theorem 3.11.2.

~~Proposition 3.11.3. Let x and let 0 ∈I~~

:= f : lim f(zn) = 0 for all sequences (zn) converging to x0 from strictly below . n J ∈A →∞ Then is a maximal ideal of . The same holds true if we replace “below” by “above”. J A 3 Functions of multiple variables 113

Proof. We only prove this statement for sequences converging from strictly below. First, we show that is indeed an ideal. It is clear that is closed under linear combinations. Furthermore, if J J f and g , then g is bounded by Lemma 3.10.6, say by the constant M. Therefore, if (z ) ∈J ∈A n is a sequence converging from below to x0, then

lim sup f(zn)g(zn) lim sup f(zn) M = 0, n ≤ n →∞ →∞ proving that is an ideal. J Assume that is not maximal and let be a proper ideal such that ( . Then contains a J K J K K function f , such that there exists a sequence (z ) converging from strictly below to x , such ∈ A n 0 that lim sup f(zn) > 0. n →∞ In particular, there exists a subsequence (z ) such that for all k N, f(z ) > ε for some ε> 0. nk k ∈ nk By Proposition 3.6.16, lim f(un) n →∞ exists for all sequences (un) converging from strictly below to x0 and the limit is larger than ε.

Let M := supx f(x) < , and consider the function ∈I | | ∞ g = 2M 1 1 . − [a,x0) It is easy to see that g is of bounded Arzelà-variation and g . Furthermore, the function ∈J ⊂K f + g is uniformly bounded away from zero in a neighbourhood of x by ε/2. Hence, is not an 0 K ideal as in Lemma 3.11.1, contradicting that is proper. Thus, is maximal. K J The maximal ideals in the above proposition are not the only ones. Deﬁnition 3.11.4. Let (x ) be a sequence in . Then we say that (x ) converges non-monotonically n I n to x , if (x ) converges to x , x = x for all n N and all x are not strictly smaller or larger 0 n 0 n 6 0 ∈ n than x0.

~~Lemma 3.11.5. Let (xn) be a sequence that converges non-monotonically to x0. Then~~

~~x := f : lim f(xn) = 0 ( n) n J ∈A →∞ is a proper ideal.~~

~~Proof. The proof that is an ideal is analogous to the proof in Proposition 3.11.3. The fact J(xn) that it is proper follows from 3.11.1.~~

If we deﬁne the ideal analogously for sequences converging from strictly below or above, notice J(xn) that those ideals are precisely the ideals in Proposition 3.11.3 by Proposition 3.6.16. However, if the sequence (xn) comes from another direction, the associated ideal is not necessarily maximal. We give the following counterexample. Example 3.11.6. Consider the two-dimensional rectangle = [0, 1]2 and the point x = (1/2, 1/2). I 0 Let (x ) be the sequence on the line (x, 1 x): x [0, 1] that converges from the upper left side n − ∈ to x0 with distance xn x0 = 1/(2n). Then it is clear that (x ) (x2 ), and the inclusion is k − k∞ J n ⊂J n proper. Indeed, let A := x : n N . Then 1 . { 2n+1 ∈ } A ∈J(x2n)\J(xn) Next, we aim to characterize when two ideals and coincide. J(xn) J(zn) 3 Functions of multiple variables 114

Definition 3.11.7. We say that two sequences (x ) and (z ) are equivalent, written (x ) (z ), n n n ∼ n if both converge to the same point and the sets A = x : n N and B = z : n N are such n ∈ n ∈ that A B and B A are finite. \ \ It is immediately clear that defines an equivalence relation on the set of convergent sequences. ∼ Lemma 3.11.8. Let (xn) and (zn) be two sequences converging non-monotonically (possibly to different limit points). Then = if and only if (x ) (z ). J(xn) J(zn) n ∼ n Proof. First, assume that (x ) (z ). If both sequences converge to different points, then obviously n 6∼ n = . Else, if A = x : n N contains infinitely many points that are not in B = z : n J(xn) 6 J(zn) n ∈ n ∈ N , then we can find a subsequence (u ) of (x ) with u / B for all n N. Then f = 1 1 is in n n n ∈ ∈ − B but not in , proving the claim. J( zn) J(xn) Conversely, assume that (x ) (z ), and let x be their common limit point. Since x = x = z n ∼ n 0 n 6 0 6 n for all n N, since both A B and B A are finite, and since limits disregard the first finite number ∈ \ \ of points, we may assume without loss of generality that A = B (otherwise we just throw away the terms of the sequences which are in A B or B A). Then again, since x = x = z and since both \ \ n 6 0 6 n sequences converge, we can assume without loss of generality that the sequences (xn) and (zn) do not repeat themselves, i.e. x = x and z = z for i = j. Now the sequence (z ) is just a reorder i 6 j i 6 j 6 n of the sequence (xn), whereby their limits and the limits of f(xn) and f(zn) for any function in or coincide. Thus, = , proving the lemma. J(xn) J(zn) J(xn) J(zn) Definition 3.11.9. We say that a sequence (x ) is weaker than a sequence (z ), written (x ) n n n (z ), if both converge to the same point and x : n N z : n N is finite. n n ∈ \ n ∈ Clearly, (x ) (z ) and (z ) (x ) if and only if (z ) (x ). “Weaker” should be understood n n n n n ∼ n as meaning “a weaker condition on the corresponding ideal”. This is illustrated in the following lemma.

Lemma 3.11.10. Let (xn) and (zn) be two sequences converging non-monotonically (possibly to different limit points). Then if and only if (x ) (z ). J(xn) ⊃J(zn) n n Proof. First, assume that (x ) (z ). If both sequences converge to different points, then obviously n 6 n . Else, if A = x : n N contains infinitely many points that are not in B = z : n J(xn) 6⊃ J(zn) n ∈ n ∈ N , then we can find a subsequence (u ) of (x ) with u / B for all n N. Then f = 1 1 is in n n n ∈ ∈ − B but not in , proving the claim. J( zn) J(xn) Conversely, assume that (x ) (z ), and let x be their common limit point. Since x = x = z n n 0 n 6 0 6 n for all n N, since A B is finite, and since limits disregard the first finite number of points, we ∈ \ may assume without loss of generality that A B (otherwise we just throw away the terms of ⊂ the sequence (x ) that are in A B). Then again, since x = x = z and since both sequences n \ n 6 0 6 n converge, we can assume without loss of generality that the sequences (xn) and (zn) do not repeat themselves, i.e. x = x and z = z for i = j. Now the sequence (x ) is just a reorder of a i 6 j i 6 j 6 n subsequence of (zn), whereby we can reorder the sequence (xn) so to be a subsequence of (zn). Now it is clear that , proving the lemma. J(xn) ⊃J(zn) Lemma 3.11.11. Let (x ) and (z ) be non-monotonically converging sequences. Then := n n J J(xn)∪ is contained in a proper ideal if and only if there exists a sequence weaker than both (x ) and J(zn) n (zn). 3 Functions of multiple variables 115

Proof. First, assume that (u ) is weaker than (x ) and (z ). By Lemma 3.11.10, n n n J(un) ⊃J(xn) ∪ and by Lemma 3.11.5, is a proper ideal. J(zn) J(un) Conversely, assume that there is no sequence weaker than both (x ) and (z ). Let A := x : n N n n n ∈ and B := z : n N . If A B were infinite, every enumeration of A B would be a sequence n ∈ ∩ ∩ weaker than both (x ) and (z ). Hence, A B is finite. Since f = 1 1 is in and g = 1 1 n n ∩ − A J(xn) − B is in , an ideal containing necessarily contains f + g. However, f + g is at least 1 except for J(zn) J the finitely many points in A B. Since x / A B, no ideal containing is of the form as in ∩ 0 ∈ ∩ J Lemma 3.11.1. Therefore, is not contained in a proper ideal. J The following Proposition shows that there are many maximal ideals of . A Proposition 3.11.12. Let (x ) be a non-monotonically converging sequence. Then the ideal n J(xn) is included in infinitely many different maximal ideals.

Proof. The set x : n N is infinite. Partition this set into a countable number of infinite sets { n ∈ } A1, A2,... . For each of the sets Ai, we can find a sequence (xi,n)n converging non-monotonically with x : n N = A . Define := . Clearly, for all i N, { i,n ∈ } i Ji J(xi,n)n ∈

J(xn) ⊂Ji by Lemma 3.11.10 since (x ) (x ). Since is a proper ideal by Lemma 3.11.5, it is included i,n n n Ji in a maximal ideal by Proposition 2.8.5. Since there is no sequence weaker than both (xi,n)n and (x ) for i = j, is not contained in a proper ideal by Lemma 3.11.11. In particular, there is j,n n 6 Ji ∪Jj no maximal ideal containing both and . Hence, is contained in inﬁnitely many diﬀerent Ji Jj J(xn) maximal ideals. 4 The Koksma-Hlawka inequality 116

~~4 The Koksma-Hlawka inequality~~

In numerical integration, particularly for high-dimensional functions, a popular tool is Quasi-Monte Carlo (QMC) integration. Given a function f : [0, 1]d R, we want to approximate its integral → over [0, 1]d by 1 N f(x ) N n nX=1 with some N-point set [0, 1]d. Doing so, we make an error. Under suitable conditions on PN ⊂ the function f and the point set , we can expect the error to vanish for N . The Koksma- PN →∞ Hlawka inequality gives us an upper bound for the error we make in the integration for a function f that is of bounded Hardy-Krause-variation. To state the Koksma-Hlawka inequality, we first need to define the star-discrepancy of a point set. Definition 4.0.1. Let [0, 1]d be a point set with # = N. The star-discrepancy of is PN ⊂ PN PN defined as 1 1 DN∗ ( N ):= sup [0,a](x) λ([0, a]) . P d N − a [0,1] x N ∈ X∈P

From now on, will always denote an N-point set. PN Theorem 4.0.2. Let f : [0, 1]d R be of bounded Hardy-Krause-variation. Then for all → PN ⊂ [0, 1]d, we have 1 f(x) f(x)dx VarHK1(f)DN∗ ( N ). N − [0,1]d ≤ P x N Z X∈P

The one-dimensional version of this equality was proved by Koksma in [33]. The multidimensional generalization was proved by Hlawka in [26]. This inequality separates the error we make into one factor only depending on the function and one factor only depending on the point set, both of which can be studied independently. The best known point sets achieve the star-discrepancy d 1 1 D∗ ( ) c (log N) − N − . (4.51) N PN ≤ d This bound does not coincide with the best known lower bound for the star-discrepancy that every point set needs to achieve. The best known lower bound differs from (4.51) in the exponent of the logarithm. A more detailed explanation can be found in [8]. Recently, Aistleitner and Dick in [2] generalized the Koksma-Hlawka inequality to arbitrary Borel measures. To state the theorem, let us first generalize the star-discrepancy to arbitrary Borel measures. Definition 4.0.3. We call a measure ν on the measurable space (Ω, Σ) normalized, if ν(Ω) = 1. For a normalized Borel measure µ on [0, 1]d, we define the star-discrepancy of a point set [0, 1]d PN ⊂ with respect to the measure µ by

~~1 1 DN∗ ( N ; µ):= sup [0,a](x) µ([0, a]) . P d N − a [0,1] x N ∈ X∈P~~

Theorem 4.0.4 ([2]). Let f : [0, 1]d R be in . Let µ be a normalized Borel measure on [0, 1]d → HK and let [0, 1]d. Then PN ⊂ 1 f(x) f(x)dµ(x) VarHK1(f)DN∗ ( N ; µ). N − [0,1]d ≤ P x N Z X∈P

~~4 The Koksma-Hlawka inequality 117~~

~~We get the following corollary, which is of particular importance for QMC.~~

Corollary 4.0.5 ([2]). Let f : [0, 1]d R be measurable and let g be the density of a normalized → Borel measure µ on [0, 1]d. Assume that f/g and that g(x) > 0 for all x [0, 1]d. Let g ∈ HK ∈ [0, 1]d be a point set. Then PN ⊂ 1 f(x) f(x)dx VarHK1(f/g)DN∗ ( N ; µg). N g(x) − [0,1]d ≤ P x N Z X∈P

Using the above corollary, the idea is that one can try to ﬁnd a function g, such that the Hardy- Krause-variation of f/g is signiﬁcantly smaller than the variation of f, thus greatly reducing the error bound. This is related to the concept of importance sampling, where points are chosen with respect to a non-uniform distribution to enhance the rate of convergence. As for the star-discrepancy of point sets with respect to an arbitrary positive normalized Borel measure, we have the following result which is due to Aistleitner and Dick.

Theorem 4.0.6 ([3]). Let µ be a positive normalized Borel measure on [0, 1]d. Then for every N N there exists a point set [0, 1]d such that ∈ PN ⊂ (3d+1)/2 (2 + log2 N) D∗ ( ; µ) 63√d . N PN ≤ N

In the one-dimensional Koksma-Hlawka inequality, one can replace the Hardy-Krause-variation by the total variation, since the two deﬁnitions coincide. Since many univariate functions of practical interest are of bounded variation, the Koksma-Hlawka inequality gives a useful upper bound on the error of integration. However, the Koksma-Hlawka inequality is a lot more restrictive in higher dimensions, as there are many simple functions that are not of bounded Hardy-Krause variation. Therefore, there has recently been a lot of work in trying to relax the conditions imposed by the Hardy-Krause-variation. Our aim is to give a short overview of the latest developments.

~~4.1 Harman variation~~

Let be an arbitrary set of subsets of [0, 1]d with , [0, 1]d . We say that a set A [0, 1]d is an D ∅ ∈ D ⊂ algebraic sum of sets in , if there exist A ,...,A such that D 1 m ∈ D n m 1 = 1 1 . (4.52) A Ai − Ai Xi=1 i=Xn+1 We denote the set of algebraic sums in by ( ). For a set A ( ) , [0, 1]d we define the D A D ∈ A D \{∅ } Harman complexity h (A) as the minimal number m such that there exist A1,...,Am with (4.52) d D d and Ai or [0, 1] Ai . Furthermore, we define h ( )= h ([0, 1] ) = 0. ∈ D \ ∈ D D ∅ D d 1 Definition 4.1.1. Let f : [0, 1] R be a bounded, measurable function such that f − ([α, )) → 1 ∞ ∈ ( ) for all α R. We write h ,f (α) := h (f − ([α, ))). If the function α h ,f (α) is Riemann A D ∈ D D ∞ 7→ D integrable over [inf f, sup f], we define the Harman variation of f with respect to by D sup f ∞ H (f) := h ,f (α)dα = h ,f (α)dα. D inf f D D Z Z−∞ Otherwise, we set H (f) = . The set of functions of bounded Harman variation is denoted by D ∞ ( ). H D 4 The Koksma-Hlawka inequality 118

~~We can also generalize the discrepancy of a point set to this more general setting.~~

~~Deﬁnition 4.1.2. For a point set [0, 1]d, we deﬁne the discrepancy of with respect to the PN ⊂ PN set by D 1 D∗ ( N ) := sup 1A(x) λ(A) . D P A N − x N ∈D X∈P~~

~~If = is the set of convex subsets of [0, 1]d , this is also called the isotropic discrepancy. D K In this setting, Harman [24] was able to prove a Koksma-Hlawka inequality for the Harman variation.~~

~~Theorem 4.1.3. Let f ( ) and let be a point set in [0, 1]d. Then we have ∈H K PN 1 f(x)dx f(x) H (f)D∗ ( N ). [0,1]d − N ≤ K K P Z x N X∈P~~

The Harman variation sometimes behaves better than the Hardy-Krause variation. For example, it is easy to see that the Harman variation of all characteristic functions 1 with A is 1, while A ∈ K the Hardy-Krause variation is always inﬁnite, except if A is an axis-parallel box. We now give a generalization of this example.

Deﬁnition 4.1.4. A function f is called quasi-convex, if all the sets f 1(( , α]) are convex. If − −∞ f is quasi-convex, then f is called quasi-concave. − Proposition 4.1.5 ([41]). If f is convex (concave), then f is quasi-convex (quasi-concave). Fur- thermore, for quasi-convex or quasi-concave f, we have

~~H (f) = sup(f) inf(f). K − However, one caveat of the set of functions of bounded Harman variation is that it is not closed under addition and multiplication.~~

~~4.2 -variation D First, we make the following simple observation.~~

Proposition 4.2.1 ([41]). If is closed under finite intersections, then ( ) is closed under finite D A D unions and intersections, i.e. ( ) is a set algebra. Furthermore, if A, B ( ) [0, 1]d , then A D ∈A D \{ } h (A B) 3h (A)h (B). D ∩ ≤ D D Many interesting classes satisfy the closure under finite intersections, for example , the set of D K convex sets, , the set of axis-parallel rectangles and , the set of axis-parallel rectangles anchored R R∗ at the origin. An example of a class not satisfying this closure property is the set of all balls. B Definition 4.2.2. We denote by ( ) the vector space of simple functions S D n 1 h = αi Ai , Xi=1 with α R and . i ∈ Ai ∈ D 4 The Koksma-Hlawka inequality 119

For f ( ), we deﬁne ∈ S D m m 1 Var , (f) := inf αi h (Ai) : f = αi Ai , αi R, Ai . S D | | D ∈ ∈ D Xi=1 Xi=1 This preliminary version of a variation already gives us a partial improvement on the Koksma- Hlawka inequality due to Harman (Theorem 4.1.3), as can be seen from the following proposition.

~~Proposition 4.2.3 ([41]). For f ( ) and [0, 1]d, we have the inequality ∈ S D PN ⊂ 1 f(x)dx f(x) Var , (f)D∗ ( N ). [0,1]d − N ≤ S D D P Z x N X∈P~~

Furthermore, for f ( ), we have Var , (f) H (f) < . ∈ S K S K ≤ K ∞ Let ( ) be the set of all measurable functions f : [0, 1]d R such that there exists a sequence V∞ D → (f ) in ( ) that converges uniformly to f. i S D Deﬁnition 4.2.4. Let f be in ( ). Then we deﬁne the -variation of f as V∞ D D

Var (f) := inf lim inf Var , (fi) : fi ( ), lim f fi = 0 . D i S D { } ⊂ S D i k − k∞ n →∞ →∞ o If f / ( ), we set Var (f) := . The space of functions of bounded -variation is denoted by ∈V∞ D D ∞ D ( ). V D Among the classes of sets that are of particular interest are the class of convex sets and the D K class of axis parallel boxes containing 0 as a vertex. We now state the most important properties R∗ of ( ) and ( ). V D V∞ D Proposition 4.2.5 ([41]). 1. ( ) and ( ) are vector spaces. In particular, Var deﬁnes a V∞ D V D D semi-norm on ( ). V D 2. ( ) is closed with respect to the supremum norm. Moreover, if (fi) converges uniformly to V∞ D f, then Var (f) lim infi Var (fi). D ≤ →∞ D 3. If is closed under intersection, then ( ) is closed under multiplication, and D V D

~~Var (f g) 3 Var (f) VarD(g) +inf f(x) Var (g) +inf g(x) Var (f). D · ≤ D x | | D x | | D~~

Proof. We prove the third statement. First, consider two simple functions f ( ) with ν ∈ S D mν ν f = α 1 ν + c 1 d ν i Ai ν [0,1] Xi=1 and mν ν αi Var , (fν) < ε | |− S D Xi=1 with Aν / , [0, 1]d . i ∈ {∅ } Note that mν c αν + inf f (x) . | ν |≤ | i | x | ν | Xi=1 4 The Koksma-Hlawka inequality 120

~~By multiplying the representations of the fν, we have that~~

~~m1 m2 m2 m1 1 21 21 11 1 f1f2 = αi αj A1 A2 + c1 αj A2 + c2 αi A1 + c1c2 [0,1]d . i ∩ j j i Xi=1 jX=1 jX=1 Xi=1 Since is closed under intersections, we get D~~

Var , (f1f2) (Var , (f1)+ ε)(Var , (f2)+ ε)+ c1 (Var , (f2)+ ε)+ c2 (Var , (f1)+ ε) S D ≤ S D S D | | S D | | S D m1 1 (Var , (f1)+ ε)(Var , (f2)+ ε)+ αi + inf f1(x) (Var , (f2)+ ε) ≤ S D S D | | x | | S D Xi=1 m2 2 + αj + inf f2(x) (Var , (f1)+ ε) | | x | | S D jX=1 (Var , (f1)+ ε)(Var , (f2)+ ε) + (Var , (f1)+ ε + inf f1(x) )(Var , (f2)+ ε) ≤ S D S D S D x | | S D + (Var , (f2)+ ε + inf f2(x) )(Var , (f1)+ ε). S D x | | S D

~~Taking ε 0 yields →~~

Var , (f1f2) Var , (f1) Var , (f2) + (Var , (f1)+inf f1(x) ) Var , (f2) S D ≤ S D S D S D x | | S D + (Var , (f2)+inf f2(x) ) Var , (f1) S D x | | S D = 3 Var , (f1) Var , (f2)+inf f1(x) Var , (f2)+inf f2(x) Var , (f1). S D S D x | | S D x | | S D Given f, g ( ), we can ﬁnd sequences (f ), (g ) of simple functions with f f and g g in ∈ V D i i i → i → the supremum norm and Var , (fi) Var (f) and Var , (gi) Var (g). Then, the preceding S D → D S D → D discussion yields

~~Var (fg) lim inf Var , (figi) 3 Var (f) Var (g)+inf f(x) Var (g) +inf g(x) Var (f). D ≤ i S D ≤ D D x | | D x | | D →∞~~

The functional Var fails to be a norm on ( ) because it is not positive deﬁnite. To resolve this D V D problem, we deﬁne the quotient space ( ) as the quotient of ( ) over the space of constant V D V D functions. Then we get the following theorem.

Theorem 4.2.6 ([41]). The space ( ( ), Var ) is a Banach space. V D D We also have a statement about the discontinuities of functions in ( ). V∞ K Theorem 4.2.7 ([41]). If f ( ), then the set of discontinuities of f is at most (d 1)- ∈ V∞ K − dimensional.

~~As it turns out, the variations VarHK1 and Var ∗ coincide. R Theorem 4.2.8 ([4, 41]). We have = ( ) and for every f : [0, 1]d R, we have HK V R∗ →~~

~~VarHK1(f) = Var ∗ (f). R~~

~~Furthermore, if is closed under intersections, then ( ) is a commutative Banach algebra. D V D 4 The Koksma-Hlawka inequality 121~~

Theorem 4.2.9 ([4]). Let be closed under ﬁnite intersections. For f ( ), we deﬁne f := D ∈V D k k f + σ Var (f). Then for σ 3, ( ( ), . ) is a commutative Banach algebra with respect to k k∞ D ≥ V D k k pointwise multiplication.

~~The bound on σ ensures that the norm is submultiplicative, as illustrated in the following lemma.~~

~~Lemma 4.2.10 ([4]). Let be closed under ﬁnite intersections. Then for all f, g ( ) and D ∈ V D σ 3, we have ≥ fg f g . k k ≤ k kk k Proof. This is a consequence of Proposition 4.2.5. We have~~

fg = fg + σ Var (fg) k k k k∞ D fg + σ inf f(x) Var (g)+ σ inf g(x) Var (f) + 3σ Var (f) Var (g) ≤ k k∞ x | | D x | | D D D f g + σ f Var (g)+ σ g Var (f) + 3σ Var (f) Var (g) ≤ k k∞k k∞ k k∞ D k k∞ D D D = f g . k kk k

~~We get a general Koksma-Hlawka inequality for the D-variation.~~

~~d Theorem 4.2.11 ([41]). Let be a family of measurable sets, let f ( ) and let N [0, 1] D ∈V∞ D P ⊂ be an N-point set. Then~~

~~1 f(x)dx f(x) Var (f)D∗ ( N ). [0,1]d − N ≤ D D P Z x N X∈P~~

~~4.3 Koksma-Hlawka inequality for the Hahn-variation~~

~~Given a point set [0, 1]d and an integrable function f : [0, 1]d R, we deﬁne the error of the PN ⊂ → numerical integration as~~

~~1 e(f, N ) := f(x)dx f(x) . P [0,1]d − N Z x N X∈P~~

~~We show the following Koksma-Hlawka inequality using the Hahn-variation.~~

Theorem 4.3.1. Let f : [0, 1]d R be an integrable function and let [0, 1]d be a point set. → PN ⊂ If N = nd and the points of are chosen such that in every rectangle of ( ) lies exactly one PN R En point, then 1 e(f, ) Var (f). PN ≤ n H Moreover, if the points of all have only irrational coordinates (we say those points are irrational), PN this bound is sharp, i.e. there are functions that attain this upper bound with equality. Otherwise, if there exists a point that has at least one rational coordinate, the upper bound is sharp up to a factor of at most 2. 4 The Koksma-Hlawka inequality 122

Proof. Assume that in every rectangle of R ( ) is exactly one point, which we call x . Then ∈ R En R 1 1 e(f, N )= f(x)dx f(x) = f(x)dx f(xR) P [0,1]d − N R − N Z x N R ( n) Z R ( n) X∈P ∈RXE ∈RXE

~~1 f(x)dx f(xR) . ≤ R − N R ( n) Z ∈RXE~~

~~Since the volumes of the rectangles R ( ) are precisely 1/N, we have ∈ R En 1 osc (f) f(x)dx f(x ) = f(x) f(x ) dx f(x) f(x ) dx osc (f)dx = R . −N R − R ≤ − R ≤ R N ZR ZR ZR ZR~~

~~Finally, since N = nd, we have~~

~~oscR(f) 1 oscR 1 e(f, N ) = d 1 VarH (f). P ≤ N n n − ≤ n R ( n) R ( n) ∈RXE ∈RXE~~

To show that the upper bound for the Hahn-variation is sharp for irrational points, take a point set d 1 N with N = n irrational points and consider the function N . Since every rectangle of ( n) P P R E contains exactly one point, 1 oscR( N ) 1 1 P VarH ( N ) d 1 = d 1 = n, P ≥ n − n − R ( n) R ( n) ∈RXE ∈RXE as there are nd rectangles in ( ). We show that equality holds. Let m n. Due to the R En ≥ irrationality of the coordinates of the points in N , no point is contained in two rectangles of ( m). d P R E Therefore, at most n rectangles in ( m) contain a point of N . In those rectangles R that contain 1 R E P 1 a point of N , we have oscR( N ) = 1, in the remaining rectangles we have oscR( N ) = 0. Hence, P P P 1 d d oscR( N ) 1 n n P d 1 = d 1 d 1 d 1 = n. m − m − ≤ m − ≤ n − R ( m) R ( m) ∈RXE ∈RXE R N = ∩P 6 ∅ 1 On the other hand, if m n, it is clear that oscR( N ) 1 for all R ( m). Therefore, ≤ P ≤ ∈ R E 1 oscR( N ) 1 P d 1 d 1 = m n. m − ≤ m − ≤ R ( m) R ( m) ∈RXE ∈RXE Thus, we have shown that 1 VarH ( N )= n. P Moreover,

~~1 1 1 1 1 1 1 e( N , N )= N (x)dx N (x) =1= n = VarH ( N ). P P [0,1]d P − N P n n P Z x N X∈P~~

1 Thus, N attains the upper bound with equality. P If the point set N has points with rational coordinates, we can still consider the same function 1 P k N . However, if a point in N has k rational coordinates, it could lie in up to 2 rectangles of P P ( ) for some m (due to the closedness of those rectangles and the fact that their borders are R Em always on “rational” hyperplanes). So the worst that could happen is that all N points have only 4 The Koksma-Hlawka inequality 123 rational coordinates, implying that every point could lie in 2d diﬀerent rectangles. Therefore, for large enough m, there could be up to 2dN rectangles of ( ) containing a point of . Hence, R Em PN for all m 2n, we have the upper bound ≥ 1 d oscR( N ) 1 2 N P d 1 = d 1 d 1 2n. m − m − ≤ m − ≤ R ( m) R ( m) ∈RXE ∈RXE R N = ∩P 6 ∅ For m 2n, it follows equivalently to the irrational case that ≤ 1 oscR( N ) 1 P d 1 d 1 m 2n. m − ≤ m − ≤ ≤ R ( m) R ( m ) ∈RXE ∈RXE Hence, 1 VarH ( N ) 2n. P ≤ 1 In fact, with some more tedious work one can show that VarH ( N ) = 2n. Regardless, we have P 1 1 1 1 e( N , N )=1= 2n VarH ( N ), P P 2n ≥ 2n P showing that the upper bound is sharp up to a constant of at most 2.

~~4.4 Other estimates~~

Another way of estimating the error of numerical integration is due to Niederreiter [35, Section 2.5], Proinov [42] and Götz [21]. To state the results, let us ﬁrst generalize the star-discrepancy. Given a point set [0, 1]d and a normalized measure µ on [0, 1]d, one can interpret D ( ; µ) as the PN ⊂ N∗ PN discrepancy between the measure µ and the discrete measure induced by the point set . From PN this point of view, we naturally get the following deﬁnition.

~~Deﬁnition 4.4.1. Let µ and ν be two normalized Borel measures on [0, 1]d. We deﬁne the star- discrepancy between those measures as~~

~~D∗(µ,ν):= sup µ([0, a]) ν([0, a]) . a [0,1]d | − | ∈~~

~~Next, we need a generalization of the modulus of continuity for higher-dimensional functions.~~

d d i Deﬁnition 4.4.2. Let f : [0, 1] R be a function, and for x [0, 1] , let x := maxi x → ∈ k k∞ | | denote the maximum norm of x. We deﬁne the modulus of continuity of f with respect to the maximum norm by

d ωf∗(δ) := sup f(x) f(y) : x, y [0, 1] , x y < δ . {| − | ∈ k − k∞ } Definition 4.4.3. Let Ω be a topological space and let ν be a Borel-measure on Ω. Then the support of ν is defined as the set of points of Ω for which every open neighbourhood has positive measure, i.e. supp(ν) := x Ω : ν(U) > 0 for all open neighbourhoods U of x . { ∈ } We say that a measure ν has finite support, if the set supp(ν) is finite.

~~We are now able to formulate the main result. 4 The Koksma-Hlawka inequality 124~~

~~Theorem 4.4.4 ([21]). Let µ and ν be normalized Borel measures on [0, 1]d and let f : [0, 1]d R → be a function. Then 1/d f dµ f dν 7ωf∗ D∗(µ,ν) . d − d ≤ Z[0,1] Z[0,1]~~

~~If µ is the Lebesgue measure, this result can be improved slightly.~~

~~Theorem 4.4.5 ([21]). Let ν be a normalized Borel measure on [0, 1]d and let f : [0, 1]d R be a → function. Then 1/d f dλ f dν 4ωf∗ D∗(λ, ν) . d − d ≤ Z[0,1] Z[0,1]~~

~~It is known that this upper bound is almost sharp in the following sense.~~

~~Theorem 4.4.6 ([42]). Let ν be a normalized measure on [0, 1]d with ﬁnite support and let c > 0 be such that for all continuous functions f : [0, 1]d R we have →~~

~~1/d f dλ f dν cωf∗ D∗(λ, ν) . d − d ≤ Z[0,1] Z[0,1]~~

Then c 1. ≥ Altogether, those theorems are a way of estimating the error of integration for measures with inﬁnite support. The main drawbacks are that the bound is not a simple product of the variation of the function and the discrepancy of the point set, and that it generally does not converge to zero if the function f is discontinuous. And even if f is Lipschitz continuous, the rate of convergence is still relatively slow due to the additional exponent of 1/d. Literature 125

~~References~~

[1] Adams, C. R. and Clarkson, J. A.: “Properties of functions f(x,y) of bounded variation”. In: Trans. Amer. Math. Soc. 36.4 (1934), pp. 711–730. [2] Aistleitner, C. and Dick, J.: “Functions of bounded variation, signed measures, and a general Koksma-Hlawka inequality”. In: Acta Arith. 167.2 (2015), pp. 143–171. [3] Aistleitner, C. and Dick, J.: “Low-discrepancy point sets for non-uniform measures”. In: Acta Arith. 163.4 (2014), pp. 345–369. [4] Aistleitner,C. ; Pausinger, F. ; Svane, A. M., and Tichy, R. F.: “On functions of bounded variation”. In: Math. Proc. Cambridge Philos. Soc. 162.3 (2017), pp. 405–418. [5] Appell,J. ; Banaś, J., and Merentes, N.: Bounded variation and around. Vol. 17. De Gruyter Series in Nonlinear Analysis and Applications. De Gruyter, Berlin, 2014, pp. x+476. [6] Arzelà, C.: “Suite funzioni di due variabili a variazione limitata”. In: Bologna Rendiconto 9 (1904-05), pp. 100–107. [7] Basu, K. and Owen, A. B.: “Transformations and Hardy-Krause variation”. In: SIAM J. Numer. Anal. 54.3 (2016), pp. 1946–1966. [8] Bilyk, D.: “On Roth’s orthogonal function method in discrepancy theory”. In: Unif. Distrib. Theory 6.1 (2011), pp. 143–184. [9] Blümlinger, M.: “Topological algebras of functions of bounded variation. II”. In: Manuscripta Math. 65.3 (1989), pp. 377–384. [10] Blümlinger, M. and Tichy, R. F.: “Topological algebras of functions of bounded variation. I”. In: Manuscripta Math. 65.2 (1989), pp. 245–255. [11] Burkill, J. C. and Haslam-Jones, U. S.: “Notes on the Differentiability of Functions of Two Variables”. In: J. London Math. Soc. 7.4 (1932), pp. 297–305. [12] Carothers, N. L.: Real analysis. Cambridge University Press, Cambridge, 2000, pp. xiv+401. [13] Clarkson, J. A. and Adams, C. R.: “On definitions of bounded variation for functions of two variables”. In: Trans. Amer. Math. Soc. 35.4 (1933), pp. 824–854. [14] Cohn, D. L.: Measure theory. Second. Birkhäuser Advanced Texts: Basler Lehrbücher. [Birkhäuser Advanced Texts: Basel Textbooks]. Birkhäuser/Springer, New York, 2013, pp. xxi+457. [15] Dirichlet, P. L.: “Sur la convergence des séries trigonométriques que servent á représenter une fonction arbitraire entre des limites donnés”. In: J. Reine Angew. Math. 4 (1829), pp. 157– 159. [16] Egorov, D. T.: “Sur les suites des fonctions mesurables”. In: C. R. Acad. Sci. Paris 152 (1911), pp. 244–246. [17] Falconer, K.: Fractal geometry. Second. Mathematical foundations and applications. John Wiley & Sons, Inc., Hoboken, 2003, pp. xxviii+337. [18] Folland, G. B.: Real analysis. Second. Pure and Applied Mathematics (New York). Modern techniques and their applications, A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York, 1999, pp. xvi+386. [19] Fourier, J.: Théorie analytique de la chaleur. Reprint of the 1822 original. Éditions Jacques Gabay, Paris, 1988, pp. xxii+644. [20] Fréchet, M. R.: “Extension au cas des intégrales multiples d’une définition de l’intégrale due à Stieltjes”. In: (1910). Literature 126

[21] Götz, M.: “Discrepancy and the error in integration”. In: Monatsh. Math. 136.2 (2002), pp. 99–121. [22] Hahn, H.: Theorie der reellen Funktionen. Berlin, 1921. [23] Hardy, G. H.: “On double Fourier series, and especially those which represent the double zeta-function with real and incommensurable parameters”. In: (1905). [24] Harman, G.: “Variations on the Koksma-Hlawka inequality”. In: Unif. Distrib. Theory 5.1 (2010), pp. 65–78. [25] Helly, E.: “Über lineare Funktionaloperationen”. In: Sitzungsberichte der Kaiserlichen Akademie der Wissenschaften zu Wien. Mathematisch-Naturwissenschaftlichen Klasse 121 II At (1912), pp. 265–297. [26] Hlawka, E.: “Funktionen von beschränkter Variation in der Theorie der Gleichverteilung”. In: Ann. Mat. Pura Appl. (4) 54 (1961), pp. 325–333. [27] Hobson, E. W.: Theory of Functions of a Real Variable. Third. Vol. 1. 1927. [28] Horowitz, C.: “Fourier series of functions of bounded variation”. In: Amer. Math. Monthly 82 (1975), pp. 391–392. [29] Huggins, F. N.: “Some interesting properties of the variation function”. In: Amer. Math. Monthly 83.7 (1976), pp. 538–546. [30] Jordan, C.: Cours d’Analyse. Gauthier-Villars, Paris, 1893. [31] Jordan, C.: “Sur la série de Fourier”. In: C. R. Acad. Sci. Paris 2 (1881), pp. 228–230. [32] Kirszbraun, M. D.: “Über die zusammenziehende und Lipschitzsche Transformationen”. In: Fund. Math. 22 (1934), pp. 77–108. [33] Koksma, J. F.: “A general theorem from the theory of uniform distribution modulo 1”. In: Mathematica, Zutphen. B. 11 (1942), pp. 7–11. [34] Kufner, A. and Kadlec, J.: Fourier series. Translated from the Czech. English translation edited by G. A. Toombs. Iliffe Books, London, 1971, pp. 13+358. [35] Kuipers, L. and Niederreiter, H.: Uniform distribution of sequences. Pure and Ap- plied Mathematics. Wiley-Interscience [John Wiley & Sons], New York-London-Sydney, 1974, pp. xiv+390. [36] Kuller, R. G.: Topics in modern analysis. Prentice-Hall, Inc., Englewood Cliffs, 1969, pp. viii+296. [37] Leonov, A. S.: “Remarks on the total variation of functions of several variables and on a multidimensional analogue of Helly’s choice principle”. In: Mat. Zametki 63.1 (1998), pp. 69– 80. [38] Liang, Y. S.: “Box dimensions of Riemann-Liouville fractional integrals of continuous functions of bounded variation”. In: Nonlinear Anal. 72.11 (2010), pp. 4304–4306. [39] Maly, J.: “A simple proof of the Stepanov theorem on differentiability almost everywhere”. In: Exposition. Math. 17.1 (1999), pp. 59–61. [40] Owen, A. B.: “Multidimensional variation for quasi-Monte Carlo”. In: Contemporary multivariate analysis and design of experiments. Vol. 2. Ser. Biostat. World Sci. Publ., Hackensack, NJ, 2005, pp. 49–74. [41] Pausinger, F. and Svane, A. M.: “A Koksma-Hlawka inequality for general discrepancy systems”. In: J. Complexity 31.6 (2015), pp. 773–797. Literature 127

[42] Pro˘ınov, P. D.: “Discrepancy and integration of continuous functions”. In: J. Approx. Theory 52.2 (1988), pp. 121–131. [43] Rademacher, H.: “Über partielle und totale Differenzierbarkeit von Funktionen mehrerer Variabeln. II”. In: Math. Ann. 81.1 (1920), pp. 52–63. [44] Royden, H. L.: Real analysis. Third. Macmillan Publishing Company, New York, 1988, pp. xx+444. [45] Rudin, W.: Principles of mathematical analysis. Third. International Series in Pure and Ap- plied Mathematics. McGraw-Hill Book Co., New York-Auckland-Düsseldorf, 1976, pp. x+342. [46] Russell, A. M.: “Further comments on the variation function”. In: Amer. Math. Monthly 86.6 (1979), pp. 480–482. [47] Stepanoff, W.: “Sur les conditions de l’existence de la différentielle totale”. In: Mat. Sb. 32.3 (1925), pp. 511–527. [48] Stepanoff, W.: “Über totale Differenzierbarkeit”. In: Math. Ann. 90.3-4 (1923), pp. 318– 320. [49] Verma, S. and Viswanathan, P.: “Bivariate functions of bounded variation: Fractal dimension and fractional integral”. In: Indag. Math. (N.S.) 31.2 (2020), pp. 294–309. [50] Yeh, J.: Real analysis. Third. Theory of measure and integration. World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, 2014, pp. xxiv+815.