<<

Spectral gap and asymptotics for a family of cocycles of Perron-Frobenius operators

by

Joseph Anthony Horan MSc, University of Victoria, 2015 BMath, University of Waterloo, 2013

A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of

DOCTOR OF PHILOSOPHY

in the Department of Mathematics and Statistics

c Joseph Anthony Horan, 2020 University of Victoria

All rights reserved. This dissertation may not be reproduced in whole or in part, by photocopying or other means, without the permission of the author.

We acknowledge with respect the Lekwungen peoples on whose traditional territory the university stands, and the Songhees, Esquimalt, and WSANE´ C´ peoples whose ¯ historical relationships with the land continue to this day. Spectral gap and asymptotics for a family of cocycles of Perron-Frobenius operators

by

Joseph Anthony Horan MSc, University of Victoria, 2015 BMath, University of Waterloo, 2013

Supervisory Committee

Dr. Christopher Bose, Co-Supervisor (Department of Mathematics and Statistics)

Dr. Anthony Quas, Co-Supervisor (Department of Mathematics and Statistics)

Dr. Sue Whitesides, Outside Member (Department of Computer Science)

ii ABSTRACT

At its core, a dynamical system is a set of things and rules for how they change. In the study of dynamical systems, we often ask questions about long-term or average phe- nomena: whether or not there is an equilibrium for the system, and if so, how quickly the system approaches that equilibrium. These questions are more challenging in the non-autonomous (or random) setting, where the rules change over time. The main goal of this dissertation is to develop new tools with which to study random dynamical systems, and demonstrate their application in a non-trivial context. We prove a new Perron-Frobenius theorem for cocycles of bounded linear operators which preserve and sometimes contract a cone in a ; this new theorem provides an explicit up- per bound for the second-largest Lyapunov exponent of the cocycle, which determines how quickly the system approaches its equilibrium-like state. Using this theorem and other tools (including a new Lasota-Yorke-type inequality for Perron-Frobenius opera- tors for use with a family of maps), we show that a class of cocycles of piecewise linear maps has a Lyapunov spectral gap (hence answering the equilibrium question in the affirmative), and we moreover have an explicit lower bound on the spectral gap. We also prove asymptotics for a family of cocycles arising from a perturbation of a fixed map with two invariant densities; we obtain a linear upper bound for the second-largest Lyapunov exponent, and the bound is sharp, in the sense that there are members of this family of perturbations where the second-largest Lyapunov exponent is linear in the perturbation parameter. The sharpness example is studied through an in-depth determinant-free linear algebra computation for Markov operators.

iii Table of Contents

Supervisory Committee ii

Abstract iii

Table of Contents iv

List of Figures vi

Acknowledgements viii

Dedication xiv

Chapter 1 Introduction 1

Chapter 2 Cocycle Perron-Frobenius Theorem 12 2.1 Cones ...... 13 2.2 Measurability and Topological Considerations ...... 30 2.3 Cocycles, Lyapunov Exponents, and the Grassmannian ...... 33 2.4 Statement of the Main Theorem ...... 36 2.5 Proof of Theorem 2.4.3 and Corollary 2.4.5 ...... 39 2.6 Easy Applications of Theorem 2.4.3 ...... 52

Chapter 3 Balanced Lasota-Yorke-type Inequality 59 3.1 Bounded Variation and Setting ...... 60 3.2 Statement and Proof of the Inequality ...... 77

Chapter 4 Application to Cocycles of Perron-Frobenius Operators 91 4.1 Cyclic Decomposition ...... 94 4.2 Uniform Lasota-Yorke Inequality ...... 97 4.3 Covering Properties ...... 104 4.4 Illustration of the Covering Method ...... 111

iv (n) 4.5 Contraction of the Cone - Spectral Gap for Lω ...... 115 4.6 Perturbation Asymptotics ...... 126

Chapter 5 Markov Paired Tent Maps 136 5.1 Markov Maps and Partitions ...... 136

5.2 Spectral Properties of An ...... 141 5.3 Spectral Properties of a Factor System ...... 149 5.4 Mixing Rates and Times ...... 153 5.5 Simultaneous Spectrum via Algebraic Geometry ...... 155 5.6 Two-Parameter Markov Paired Tent Maps ...... 160

Chapter 6 Conclusion 164

Appendix A Assorted Lemmas, Proofs, and Computations 166 A.1 Miscellaneous Ergodic Theory ...... 166 A.2 Miscellaneous Tools ...... 170 A.3 Miscellaneous Examples ...... 175

Bibliography 177

v List of Figures

Figure 1.1 A Markov chain with four states and its associated transition matrix...... 2 Figure 1.2 The stirring map T that moves chocolate chips around in the banana bread batter [−1, 1]...... 3 Figure 1.3 The Perron-Frobenius operator L sends densities f to new den- sities Lf (schematic only)...... 4

Figure 1.4 Schematic of “leaking” behaviour, where 1(ω) and 2(ω) are gen- erally small...... 9

2 Figure 2.1 The cone Cy,α for y = (1, 2) and α = 2 in R , depicted with the perpendicular plane at cy with c = 1/2...... 18 Figure 2.2 Schematics of cocycles along an orbit of x (both non-invertible and invertible)...... 33

3 2 Figure 3.1 FH (f), for f(x) = 36x + 4x − 3x − 0.1 restricted to [−1, 1] and H = [−0.4, 0.3]...... 72 Figure 3.2 The setup in Example 3.2.6...... 85 Figure 3.3 Computing L(f) using Lemma 3.2.4...... 86 Figure 3.4 A map T : [0, 1] → [0, 1] with four hanging points, namely: (0.4, −), (0.4, +), (0.6, +), and (1, −)...... 87

Figure 4.1 The paired tent map, with parameters 1 = 0.3 and 2 = 0.7. . . 92

Figure 4.2 The second iterate of the coupled tent map, Sω, with parameters

1(ω) = 0.1, 2(ω) = 0.2, 1(σ(ω)) = 0.1, and 2(σ(ω)) = 0.2. . . 98

Figure 4.3 The second iterate of the coupled tent map, Sω, with parameters

1(ω) = 0.7, 2(ω) = 0.3, 1(σ(ω)) = 0.2, and 2(σ(ω)) = 0.6. . . 99 Figure 4.4 The function f and its image under the Perron-Frobenius op-

erator Pω. Observe that both functions are supported only on [0, 1]...... 105 Figure 4.5 The map T in Remark 4.3.6...... 109

vi 1 Figure 4.6 A picture of (Lδ1,δ2 − L1,2 ) [−1,−1/2] for 0 < δ1 < 1. The first jump is of size 1 , and the second jump is of size 1 , so 2(1+δ1) 2(1+1) the variation is the sum of the two jump sizes...... 126

Figure 5.1 The paired tent map Tκ,κ, with parameter κ = 0.3...... 137

Figure 5.2 Markov partitions for Tκn , with n = 1, 4...... 140

Figure 5.3 General form of the (2n + 4)-by-(2n + 4) adjacency matrix An. . 141

Figure 5.4 The (2n + 4)-by-(2n + 4) matrix Jn...... 142

Figure 5.5 A zoomed-in look at the Markov partition for Tn in [−1, 0]. . . . 143

Figure 5.6 Pictures of the roots of fn and gn for different values of n; roots

of fn are marked with crosses, roots of gn are marked with circles,

and the origin is marked with an asterisk (where An has a double eigenvalue). The circle of radius 2 is a dashed line, the unit circle is a solid line, and the circles with radius 1 ± n−1 are dotted lines.147 ˜ Figure 5.7 The map Tκ, for κ = 0.3...... 149 ˜ Figure 5.8 Markov partitions for Tn, for n = 1, 4...... 151

Figure 5.9 General form of the (n + 3)-by-(n + 3) adjacency matrix Bn. . . 151 Figure 5.10 The (n+3)-by-(n+2) matrix ι, representing the inclusion E+ → Cn+3...... 151

Figure 5.11 The (n + 2)-by-(n + 2) matrix Cn, representing the action of An on E+...... 152

Figure 5.12 The steeper function (in blue) is dm(κ) and the flatter function

(in red) is cn(κ), for m = 2 and n = 3. The x-coordinate of the intersection is the κ value corresponding to (n, m) and the y-coordinate is the ζ value...... 161

vii ACKNOWLEDGEMENTS

Those who have spent significant time with me over the course of my doctorate will not be surprised to see that my list of acknowledgements is a metaphorical mile long (for the purpose of using metric units, it’s actually about 1.15 metres, give or take). For my Master’s, I wrote no names; all of the people I am about to name deserve to be recognized for being wonderful people and making my life better in one way or another during my Ph.D. (Note that folks will only be named once, even if they played multiple roles in my life.) If I forgot anyone, that’s on me and not on you.

First and foremost, I would like to thank my supervisors, Christopher Bose and Anthony Quas. It has been an interesting six years and eight months! Thank you for being fabulous academic mentors. Your willingness to meet with me often, talk about whatever, and be candid with me has been a great help. It is striking how much more I am like Chris than Anthony, isn’t it? But I learned so much from both of you regardless. I hope you two appreciated the time we spent together bashing our heads on the chalkboard in the lounge and bothering everyone on the fifth floor with our Skype conversations; I did some good work thinking on the fly, I think. I admit that I will appreciate not having to look your comments on my work for a while, though. I should also express my appreciation for your financial support, through NSERC (in addition to my own scholarship).

Next, I would like to thank two people who have very little to do with my re- search and much more to do with my teaching and mental health, Jane Butterfield and Christopher Eagle. I will not forget how, back in 2014, you e-mailed me, Jane, asking about how on my webpage, I said I wanted to help folks learn how to mark better (read: without being miserable). I’m not sure I figured that one out, but anyway. Your open door and willingness to listen meant that it was probably helpful for you that I moved upstairs when I started my Ph.D. The vast amount that I learned about teaching from you is no less important than how much I learned about research over the years. In addition, thank you for making the Assistance Centre one of the places in which I most enjoyed being; it really felt like home. Chris, thank you (as well) for teaching me about teaching, but perhaps I appreciated even more your interest in research and your enthusiasm for sharing fun problems and interesting theorems. I know I jump straight to guessing that whatever you’re talking about has to do with

viii the Riemann or Continuum Hypotheses, but still! I am also glad that I was able to show you how Eisenstein’s Criterion actually has a use in the wild; you’ll keep that one with you forever.

I would like to give a blanket statement of thanks to the entire office staff in the Math and Stats department: Carol Anne Sargent, Amy Almeida, Patti Arts, and Kristina McKinnon (as well as the others who filled some of those spots throughout the years). I haven’t always been the model student in terms of finishing things much before deadlines (sorry, Amy), but you folks have always been great about keeping the department moving and setting me straight on whatever it is I’m confused about. You’re all amazing. The same goes for Kelly Choo, our systems administrator: I could always count on a good answer from you about all of my tech concerns.

There is a long list of the other faculty members in the department with whom I worked and had great conversations; in alphabetical order by last name, thanks to Trefor Bazett, Peter Dukes, Marcelo Laca, Mary Lespersance, Gary MacGillivray, Kieka Mynhardt, Svetlana Oshkai, and Ian Putnam for being wonderful individuals. Also thanks to Chedo Barone, who was not faculty but was one of my favourite people in the department: your smile always brightened my day.

At this point, I would like to turn my attention to some of my friends in Victoria. I found it was hard to make friends here, but these people gave me time and for that, I value them immensely. Julie Fortin, you are sharp, on-the-ball, and quick to smile; you made an amazing Director of Services and my time spent with you was always fun. Thank you, also, for actually reading this document! Janet Sit, I always appreciated your wit and your love of music (and also of science!). Thank you for running Trivia with me and hanging out afterwards; thank you for having adorable stuffed animals and an insatiable desire to learn. You are so courageous; never forget that. Alyssa Halpin, thank you for the hugs, primarily, but also thank you for the conversations and the perspective and your willingness to listen. It has always been nice to hang out with you (and Garrett Culos! I miss him). Alyssa Allen, I appreciated your smile and quirkiness and the fun times we had hanging out doing work or playing board games (when we could make it happen). You certainly helped me learn not to worry about texting. Elissa Whittington, thank you for the lunch hangouts and casual conversation; I’m glad that we could always find something to talk about over so many years.

ix I spent a lot of time volunteering for the UVic Graduate Students’ Society (the GSS, in the common lingo) over my doctorate program; probably way more than reasonable, but here we are. Thank you to all of the Executive Board members who did such a great job keeping the GSS headed in the right direction (there are a lot of you). A special thank you goes to the lovely staff! Stacy Chappel, you have always been a fantastic resource and a good friend; you care so much. Karen Potts, I hope you find someone on Grad Council who cares about minutes and governance as much as I did. Neil Barney, you are such a fabulous person and we saw eye-to-eye on a lot of things; my Events Committee experience would not have been nearly as great without you. Jo¨elleAlice Michaud-Ouellet and Mindy Jiang, you have done such a great job with the Health and Dental Plan administration and you always had a smile for me in the office. Shout-outs to Rachel Lallouz; I always enjoyed seeing you in the office. Thank you to all of my Stipend Reviewers over the years; I especially liked this past year’s edition, featuring my favourite cowboy-hat-wearing Nicholas Planidin. Thank you also to my button-making buddy, Brooklynn Trimble; thank you for taking me up on that opportunity and hanging out with me at other times (and giving the GSS reason to replace the circle cutter; glad you’re okay). Here, I would also like to thank all of the people who came out regularly to my Trivia events: Tiffany Chan, Rose Morris, and Melanie Oberg from English, Kate Fairley and the Econ folks (with their silly team names), and many others.

I am very thankful that the Math and Stats department had a graduate student group: SIGMAS made it worth trying to organize events. Thank you to all of the people who enjoyed reading my Tea-Mail and eating my baked goods at Tuesday Tea! That was always a highlight for me. Thank you to my officemates who endured my loud keyboard and still talked to me every once in a while; special thanks go to my favourite big sister that I never had, Joanna Niezen. I appreciated all of our conversations and teasing and jokes and shared experiences; I’ll take good care of Claire Pollenegger (our fake office plant, for those who didn’t stop by). Thank you to the various members of the SIGMAS Executive who tolerated my insistence on minutes and whatnot, es- pecially Laura Teshima; you are a wonderful person. Thank you to all of the people who made Chris Bruce’s learning seminars fun: among others, Anna Duwenig, Mark Piraino, Emily Korfanty, Dan Hudson, Anthony Cecil, and Dina Buric. Thank you also to the folks who made the CMS Math Camps happen, at various times, including Amanda Malloch and Brittany Halverson-Duncan; we’ll have to do pizza again when the world stops ending. I’ll throw out some other names here for folks I appreciated,

x no matter what you role you had: Flora Bowditch, Mackenzie Wheeler, Chloe Lamp- man, Felicia Halliday, MacKenzie Carr, Kevin Hsu, Chi Kou, Sam Churchill, Jane Wodlinger, Michelle Edwards, Josh Manzer, and Kseniya Garaschuk.

I spent a large amount of time at workshops hosted by the Division of Learning and Teaching Support and Innovation (LTSI), which houses the former Learning and Teaching Centre. I value what I learned there, but at the same time I hope I was helpful to others at those same workshops (especially when I attended the opening session of the Fall TA Conference for the fifth, sixth, or seventh straight year). I especially want to thank Gerry Gourlay; your endless positivity and cheerfulness is infectious. I would not be as comfortable with the whole idea of intended learning outcomes without our conversations.

My physical health would like to thank the members of the Physics and Astronomy (and friends) intramural ball hockey team with whom I played for a few years; we weren’t that great, but we sure tried hard and had fun. I particularly remember Sandra Frey, Nick Fantin, Ashley Bramwell, Jared Keown, Collin Kielty, our two Clares (Higgs and Trotter), Jemma Green, Emma Loy, Zack Draper, Maan Hani, Ben Gerard, and Tony Kwan. I am so thankful that you welcomed me so readily.

I would like to make a point of thanking Jennifer Wong and Mira Cvitanovic for doing such a great job putting Convocation together; it was such a delight to volunteer as a robing assistant. I look forward to reprising that role while wearing my own robes, at some point!

I had the absolute pleasure of volunteering with UVic Orientation for many years, including this last year as the Graduate Tour Leader Cohort Lead. I have often said that the best opportunities for volunteering are when the people with whom you are working are fantastic; here is no exception. I want to thank, in particular, Kate Hollefreund, Jasmine Peachey, Nora Loyst, and Suriani Dzulkifli for being positive, enthusiastic, and incredibly well-organized facilitators of the Orientation program; I enjoyed working with you immensely, especially because you welcomed my willingness to help throughout the entire day of Graduate Student Orientation year after year. I also want to thank Russ Wong for keeping in touch and giving me an outlet for shared experiences; there aren’t many folks who know what Orientation is like at both UVic and the University of Waterloo, but here are two of us!

xi I had the pleasure of being involved in Three Minute Thesis and the Faculty of Graduate Studies Council. In particular, thank you to Carolyn Swayze, Bernadette Perry, and Karolina Papera Valente for all of your help in 3MT, and thank you to David Capson for doing such a great job as Dean for all of these years. I was also involved in the inaugural President’s Fellowship in Research-Enriched Teaching; I am glad to have spent so much time with the other recipients, including Stephanie Field, Mary Anne Vallianatos, and Carla Osborne.

Lastly among my Victoria friends, I would like to acknowledge my Learning And Teaching in Higher Education (LATHE) 2018-2020 cohort: Janice Niemann, Natalie Boldt, Elizabeth Williams, Tasha Jarisz, Mitch Haslehurst, Pierre Iachetti, Mohamed Seifeldin, Jeremy Wintringer, and H´ectorV´azquez(as well as Les Sylven, who was only with us for a term). Our classes felt like home to me. I will forever be indebted to you for your kindness and your wisdom; thank you for being my friends. I will always be there for you. I promise.

Moving out west in 2013, I left behind an entire undergraduate degree’s worth of support network, but I still have many friends out in Ontario whom I still hold very dear (as evidenced by the cookies I have mailed across the country). Thank you to Michelle Cannon, Sandra Regier, Kimberley McClatchie, Heather (n´eeIsenegger) and Michael Overmeyer, Christopher Snow, Sophie Twardus, Robin Lawrence, Paul Hendry, and Emma McCutcheon for being such wonderful people; there’s a reason I always try to visit when I pass through. Special thanks to Paul and Emma (and their dog Arlo!), who made life in Victoria for me that much better while they were living there too.

I have three friends out east about whom I would like to be more specific. Thank you to Carolyn Kimball, for continuing to talk with me even after going to Switzer- land and back; I believe in your ability to succeed in whatever you are doing. You are such a caring person. Thank you to Katie Schreiner; you have a fantastic taste in church services and music, and I have appreciated all of the time we have spent together. Maybe we’ll write more songs together, sometime? If you don’t get to be an astronaut, then you’ll still get to canoe and play your ukelele (okay, and probably do some cool river dynamics); I hope that’s something. And perhaps above all, thank you to Melissa Snow (n´eePettau): you have always been there for me, even if only as a string of characters on a screen. I will always be grateful for our conversations and your friendship.

xii Finally, thank you to my extended family. I moved to Victoria partially because I had family in town; I have loved all of the joint birthday celebrations and playing baseball and eating amazing dinners. I could not have asked for better family. Thank you also to my brother, Jeffrey: though we don’t talk that much, we have a much better relationship than we did back when we were in high school. We should chat more; maybe one day I’ll come to visit you.

xiii DEDICATION

My dissertation is dedicated to my parents, Dan and Lina. Whether I have noticed or not, they have always been supportive of me and have never once pushed me to do the things I have ended up doing. They let me grow and figure things out on my own and helped me up whenever I tripped and fell. I probably would not have ended up here without them; in particular, I am glad that I realized that I still know their landline number by heart, because while I just listed a whole bunch of people with whom I have enjoyed interacting, my parents are the only people I know who are unconditionally willing to take phone calls about anything. Incidentally, they have also been fantastic landlords. I really did luck out, no matter what they say. I hope they are proud that I got this far.

Chances are, I have not said the following enough in my life, so I will write it in the most conspicuous space in the entire document:

I love you both so very much.

xiv Chapter 1 Introduction

Dynamical systems are, whether people are aware of it or not, everywhere around us. At a very high level, a dynamical system is some set of things (maybe states of an object, or gas particles in box; maybe the Moon revolving around the Earth, or the tides on the beach) together with rules for how these things change. It is natural to ask questions about these systems: what will usually happen, and how long does it take? Does the system tend to an equilibrium state, like a chemical reaction or a simple spring? What happens if, instead of a fixed set of rules, the rules change over time; can we still expect the system to tend to some sort of equilibrium at some rate? For a natural system, we find mathematical models to describe the system and apply various methods to answer questions about the system’s properties. For certain models, we may even need to first develop new tools in order to answer more than the simplest questions about the model. The contribution of this dissertation is exactly that: new tools developed to study an interesting model of dynamical systems.

Markov Chains and Deterministic Dynamical Systems Consider, for the time being, a column stochastic d-by-d matrix P . The matrix P represents a finite-dimensional Markov chain, a stochastic model where states transition to one another with some probability at discrete time steps according to the entries in the matrix. Thus, if at time 0 the probabilities of being in each of the d states are given by the vector x, then the probabilities of being in each of the d states at time 1 are given by P x (P acting on x); see Figure 1.1. The matrix P is called the probability transition matrix for the Markov chain. We already know how to study finite-dimensional Markov chains: the asymptotic properties of the Markov chain, such as what the stationary distribution is, are determined by the of the matrix P . Tools arising from linear algebra, potentially including numerical computation techniques, then allow us to compute these desired quantities: the stationary distribution is the eigenvector corresponding to the eigenvalue 1, for example. In particular, in the case that the Markov chain is mixing (that is, there is a unique

1 0.6

2 0.4 0.3   0.2 4 0.5 0.2 0.1 0.1 0.3 0.6 0.1 0.1 P =   1 0.4 0.1 0.1 0.4 0.4 0.5 0.4 0.1 0.1 0.4 0.4 3

0.4

Figure 1.1: A Markov chain with four states and its associated transition matrix. stationary distribution to which all initial distributions converge), we wish to find the modulus of the second-largest eigenvalue(s), which tells us the rate at which the Markov chain converges to the stationary distribution. The mixing time for the chain is then at most proportional to the reciprocal of the logarithm of the modulus of the second- largest eigenvalue.1 To prove that a Markov chain is mixing, if the transition matrix P is primitive, meaning that there is a power of P that has all positive entries, then we can use the classical Perron-Frobenius theorem [15, 42] to show that the chain has a unique stationary distribution and a second-largest eigenvalue of modulus strictly less than 1. How does the Markov chain model fit into our dynamical systems perspective? Instead of seeing the chain as a stochastic process, we can view the elements of the system as the probability vectors, and the elements change via multiplication by the probability transition matrix, so that the rule for change is simply multiplication by P . We could even look at all vectors in our finite dimensional space instead of just those with non-negative entries summing to 1; then we really are looking at the action of a (bounded) linear operator on a vector space. As mentioned above, we certainly have tools for analysis of this situation! Returning to the general dynamical systems setting, if we have a map T on some state space X, some questions we would like to answer are “what happens to most of the orbits of T over a long time?” and “do regions of X mix together over time, and at what rate?” These questions are less about looking at individual orbits of points under T and more about looking at what happens on average. Specifically, we can learn much about the dynamical system (X,T ) by studying how probability densities on X change over time under the action of T . If our state space X is finite, then this feels very familiar; it is the Markov chain situation again! But if X is more general, like an interval or a riverbed or an ocean, then the densities (probability or heat or

1For the proof of this fact and for more on Markov chains, see the book by Levin, Peres, and Wilmer [30]; applications include statistical mechanics and Markov chain Monte Carlo (MCMC).

2 1

0.5

0

-0.5

-1 -1 -0.5 0 0.5 1

Figure 1.2: The stirring map T that moves chocolate chips around in the banana bread batter [−1, 1]. similar) on X no longer form a subset of a finite-dimensional vector space; they are now infinite-dimensional, and we must find new tools to perform the analysis that we desire. As both a mostly-concrete example and a rough analogy (and to introduce the key example that we will explore in much greater detail later), consider the space X = [−1, 1] equipped with normalized Lebesgue measure λ, and let f ∈ L1(λ) be a probability density (that is, kfk1 = 1 and f ≥ 0). We can imagine that the space [−1, 1] is a bowl of banana bread batter into which one has placed chocolate chips, and f is the density of chocolate chips. Then, let T be a map like the one pictured in Figure 1.2. Applying the map T stirs the space up like you would with a spoon or a mixer, moving the chocolate chips around; there is then a new density, call it Lf, that describes the new locations of the chocolate chips; see the schematic diagram in Figure 1.3. Some parts of the batter may have more chocolate chips than before, and some fewer, but the total amount of chocolate chips has not changed. If we continue to apply the map T to get, after n steps, T n, then we should get something like Lnf, a new density that describes where the chocolate chips are after n steps. It turns out that the operator L can be defined on all of L1(λ), and is bounded and linear; we call L the Perron-Frobenius operator associated to T . To be rigorous, Lf is the Radon-Nikodym derivative of the measure A 7→ λ(f1T −1(A)), which exists in this case because T is a non-singular map (if A has zero measure, then T −1(A) also has zero measure); see, for example, Chapter 4 of [7]. Now it really does look like we are doing

3 f L Lf

Figure 1.3: The Perron-Frobenius operator L sends densities f to new densities Lf (schematic only). something similar to the Markov chain case; we have a Banach space and a bounded linear operator acting on the space in a physically meaningful way. The operator L is the infinite-dimensional analogue of the transition matrix P for the Markov chain. If we want to find the analogue of a stationary distribution (called an invariant density in this setting), we find an eigenvector for L with eigenvalue 1; if all initial densities on [−1, 1] converge to an invariant density over time, then we have a good idea of where most of the points in [−1, 1] end up in the long run: no matter where they started, points will be distributed over [−1, 1] according to the invariant density. Moreover, if there is a gap in modulus between an eigenvalue of 1 and the rest of the spectrum, this gap describes how quickly this convergence occurs, in the same way as described above for Markov chains. Note that if the map T from Figure 1.2 had tents that did not cross the x-axis, then T would leave the regions [−1, 0] and [0, 1] invariant; it would be as though you separated the cookie batter into two pieces and only mixed each piece with itself. It is easy to imagine that in this case, T would have two invariant densities: the characteristic functions on [−1, 0] and [0, 1], respectively. Unfortunately, some technical details get in the way of the analysis; the spectrum of L on L1(λ) is not particularly well-behaved in general,2 so we find an invariant subspace of L1(λ) on which the spectrum of L is much nicer, and then the analysis follows basically as in the Markov chain case, using more powerful and sophisticated tools. The desired subspace is that of the (equivalence classes of) bounded variation functions, which we will denote by BV (λ). Lasota and Yorke [29] showed, in 1973, that for a class of piecewise-expanding maps, one can find invariant densities by looking at bounded variation functions and using their namesake inequality; in 1984, Keller [26] discussed the link between the spectral theory of Perron-Frobenius operators acting on bounded variation functions and the rate of convergence of the underlying dynamical systems to an equilibrium. Other authors have used the compact embedding of BV inside of L1 to perform similar types of spectral analysis (see Hennion [20] or Keller and Liverani [27]). Analogously to the classical Perron-Frobenius theorem in the Markov chain setting, there are similar general theorems, often called Krein-Rutman theorems after the gen-

2As shown by Ding, Du, and Li [9], Perron-Frobenius operators can have L1-spectrum equal to the entire closed unit disk.

4 eralization of the Perron-Frobenius theorem for compact positive operators by Krein and Rutman [28], that assist in analysis of these dynamical systems. The key aspect in all of these theorems is the notion of positivity. Roughly, the positive direction is indicated by a cone of vectors, and operators which preserve and contract the cone have very nice structure: there is a vector in the cone playing the role of the direction of largest growth, and a complementary subspace of non-positive vectors which grow more slowly than anything in the positive direction. This insight was notably abstracted by Garrett Birkhoff in the 1950s [5], and the study of abstract cones has played a role in the study of general topological vector spaces (for example, [41, 49]). In 1995, Liverani [33, 34] applied Birkhoff’s technique to the study of piecewise-expanding C2 dynamical systems, obtaining invariant densities and explicit decay of correlations for powers of a single map.

Random Dynamical Systems We can then take this idea one more step forward. Typically, when mixing banana bread batter, you would not always mix it the same way. This idea corresponds to applying not just a single map T to [−1, 1] at each step, but to apply a map chosen from a family of maps Tω, parameterized by some set Ω, with a different map chosen at each time step. When we look at the composition of a number of these maps, it gives us a notion of random dynamical system; the adjective is appropriate, at least by analogy with the batter mixing, because you do not know exactly how you will have mixed the batter after n steps. For a more rigorous formulation, we turn to abstract measurable dynamical systems theory. Let µ be a probability measure on the set Ω, and suppose that σ :Ω → Ω is a measurable map such that (µ, σ) is an ergodic map-and-measure pair; for short, we say that (Ω, µ, σ) is an ergodic probability-preserving transformation (taking the underlying σ-algebra on Ω for granted). Suppose that there is a measurable assignment (n) of some map Tω to each ω in Ω. Then we can define the n-th step map Tω to be

Tσn−1(ω) ◦ · · · ◦ Tω, where we compose the maps Tσi(ω) along the orbit of ω under σ. (n) Since (Ω, µ, σ) is ergodic, the map Tω is random, in the sense that on average in the long run, each of the individual maps are chosen roughly according to the probability (n) distribution µ (which may not be true for short time scales). We call the map Tω a cocycle of maps. We will be assuming that σ is also invertible throughout, so that (Ω, µ, σ) is an ergodic invertible probability-preserving transformation. We are interested in the long-term behaviour of the random dynamical system, just like for our deterministic systems from before. Is there something like a random stationary distribution? Do densities converge at some rate to the stationary distribu- tion; if so, what is the rate? Unlike before, we observe that it makes no sense to try to (n) (n) consider eigenvalues of associated Perron-Frobenius operators Lω for Tω ; since the

5 (n) operators Lω are also compositions of the individual operators Lσi(ω) along the orbit of ω (hence a cocycle of operators), the eigenvalues would depend on ω and n and could easily be different for each pair in all but the simplest cases. Instead, we have different quantities, called Lyapunov exponents, that play the same role as the eigenvalues, but (n) focus more on the asymptotic and on-average behaviour of the maps Tω ; instead of eigenspaces, we have Oseledets spaces, named for the first person to introduce them (in [40]). The associated Perron-Frobenius operators are invariant on these subspaces, and functions (densities) in these subspaces grow at specific rates corresponding to the Lyapunov exponents. The framework for studying the actions of these random compositions of linear op- erators is multiplicative ergodic theory, named after Oseledets’s Multiplicative Ergodic Theorem (MET) for cocycles of matrices in [40]. There is a rich literature of general- izations of the original MET, first to cocycles of compact operators on Hilbert spaces [46], then to cocycles of compact operators on Banach spaces [35], and then to quasi- compact cocycles of bounded operators on Banach spaces with varying measurability and continuity requirements [31, 50]; in every case, the base system (Ω, µ, σ) and the matrices or operators are invertible. More recently, these generalizations have been extended [16, 17, 19] to allow for no invertibility conditions on the operators while retaining the full decomposition of the Banach space into Oseledets spaces, as long as the base system (Ω, µ, σ) remains invertible (previously, for non-invertible operators one only obtains a “flag” of subspaces on which vectors grow at a rate at most one of the Lyapunov exponents, instead of exactly at that rate). An MET can be seen as the replacement for diagonalizability of a single matrix in the cocycle setting (although as many of the proofs indicate, it takes on more of a singular value decomposition flavour, as seen in [16, 44]). All of this said, one does not need an MET to define Lyapunov exponents, just as one does not need diagonalizability to define eigenvalues, so we can see how other tools play a role and interact with the MET in a given situation. In particular, is there some sort of generalization of the classical Perron-Frobenius theorem or Krein-Rutman theorems for cocycles or matrices or operators with some notion of positivity, that would guarantee the existence of something like an invariant subspace corresponding to the largest growth rate? The answer: yes! (Though the subspaces are called equivariant rather than invariant.) The literature is fairly extensive on this subject; we give a sampling. In 1988, Ferrero and Schmitt [14] proved a cocycle Perron-Frobenius theorem for cocycles of transfer operators on symbolic dynamics spaces. In 1994, Arnold, Demetrius, and Grunwald [2] proved a cocycle Perron-Frobenius theorem for a class of positive matrix cocycles arising in evolutionary biology, based on the Birkhoff cone technique, for the purpose of obtaining a random invariant density. Hennion [21], in 1997, used positivity to establish properties of stochastic products of matrices. Evstigneev and Pirogov [13] established a cocycle Perron-Frobenius theorem for non-linear operators in finite

6 dimensions in 2009. In 2010, Rugh [47] developed the theory of complex cones and proves a cocycle Perron-Frobenius theorem for linear operators preserving complex cones (as opposed to real cones, in real Banach spaces). Mierczy´nskiand Shen [39], in 2013, proved a Perron-Frobenius theorem for cocycles of compact operators. In 2015, Lian and Wang [32] prove a cocycle Perron-Frobenius theorem for finite-dimensional operators preserving cones of specific lower dimensions. When we apply Multiplicative Ergodic Theorems to quasi-compact cocycles of Perron-Frobenius operators arising from random dynamical systems, the Oseledets spaces can be interpreted as indicating “coherent structures” in the system, as de- scribed in [16]. When the spaces correspond to large Lyapunov exponents of the cocycle (when compared to local expansion and dispersion), the Lyapunov exponents describe how these parts of the system are “slowly exponentially mixing”, in the sense that they are not invariant sets but they mix with the rest of the space more slowly than one would expect from local expansion. When the Oseledets space corresponding to the zero Lyapunov exponent turns out to be one-dimensional, there is a complementary equivariant family of subspaces on which the Perron-Frobenius operator cocycle, re- stricted to the spaces, has the second-largest Lyapunov exponent. Thus, if we are able to compute a gap between the largest and second-largest Lyapunov exponents, then we can show that all dissipating coherent structures mix with the space at least at some rate. Specifically for cocycles of Perron-Frobenius operators, Buzzi [8] in 1999 extended Liverani’s application of Birkhoff’s positive operator framework to cocycles of maps, instead of a single map, to obtain decay of correlations for certain random dynamical systems where not every Perron-Frobenius operator is required to preserve a cone. In this case, the largest Lyapunov exponent is equal to 0 and corresponds to an equivariant family of densities. Then, the decay of correlations is related to the second-largest Lyapunov exponent for the cocycle of operators, and its magnitude gives the logarithm of the rate of decay of correlations. However, the constants involved in the proof of Buzzi’s result are not easily identifiable (see the remark on pg. 28).

Summary of Results As outlined previously, we are therefore interested in finding an upper bound for the second Lyapunov exponent for the Perron-Frobenius cocycle corresponding to a random (n) dynamical system Tω , as that would give us a minimal mixing rate for the system (or decay of correlations). The approach we take is to prove a generalized Perron-Frobenius theorem for a fairly general class of cocycles of operators on a Banach space preserving a cone. In particular, no compactness is required (as in [39]); however, we do require that almost every operator preserves the cone (which is a stronger restriction than what Buzzi works with in the specific cases in [8]). The setup we use allows us to obtain a

7 measurable equivariant decomposition of the Banach space into an equivariant positive direction with the largest growth rate and an equivariant family of subspaces of non- positive vectors growing at the next largest growth rate. An important outcome of the theorem is a rigorous quantitative bound on the second-largest Lyapunov exponent for such cocycles that is computable without tracing constants along in the proof of the theorem itself. Another important aspect of the theorem is that the proof is completely independent of any Multiplicative Ergodic Theorem; hence, the hypotheses provide a checkable condition for quasi-compactness and thus a full Oseledets decomposition for the cocycle after applying an MET in the appropriate setting. A summarized version of the theorem follows; see Section 2.4 for more details, including required definitions and what is meant by measurable in our context. Moreover, Corollary 2.4.5 provides even simpler sufficient conditions for the existence of the quantities listed here in the hypotheses.

Theorem A. Let (Ω, B, µ, σ, X, k·k ,L) be either a strongly measurable random dynam- ical system or a µ-continuous random dynamical system, such that log+ kL(1, ·)k ∈ L1(µ). Let C ⊆ X be a nice cone such that L(1, ω)C ⊆ C for all ω. Suppose that there exists a positive measure subset GP of Ω, a positive integer kP , and a positive  real number DP such that for all ω ∈ GP , diamθ L(kP , ω)C ≤ DP . Then there exists a σ-invariant set of full measure Ω˜ ⊆ Ω on which the following statements are true:

1. There exist measurable functions v(ω) ∈ X and η(ω, ·) ∈ X∗ such that

X = spanR{v(ω)} ⊕ ker(η(ω, ·))

is a measurable equivariant decomposition, the Lyapunov exponent for v(ω) is

λ1, and all vectors in ker(η(ω, ·)) have Lyapunov exponent strictly less than λ1,

unless λ1 = −∞.

µ(GP ) 1 −1 2. When λ1 > −∞, we have λ2 ≤ λ1 − log tanh( 4 DP ) < λ1. kP We remark specifically on the quantitative bound on the second-largest Lyapunov exponent. The set GP and the quantities kP and DP are dependent on the cone and the cocycle and can therefore be computed outside of the proof of the theorem. The theorem is therefore something of a black box for the bound and, subsequently, a minimal mixing rate or decay of correlations. Outside of cocycles of positive operators, this problem is quite difficult. To demonstrate the use of the theorem, we apply it to the situation of a cocycle of piecewise expanding maps with a specific form; all of the maps Tω are of the same form as the map pictured in Figure 1.2. These “paired tent maps” act on [−1, 1] and leave [−1, 0], [0, 1] mostly invariant, except for “leaking” mass of size 1(ω) ≥ 0 from [−1, 0] to [0, 1] and mass of size 2(ω) ≥ 0 in the other direction (see Chapter 4

8 1(ω)

[−1, 0] [0, 1]

2(ω)

Figure 1.4: Schematic of “leaking” be- haviour, where 1(ω) and 2(ω) are gener- ally small. for the precise definitions and Figure 1.4 for a simple schematic diagram). Maps like these have been considered by Gonz´alezTokman, Hunt, and Wright in [18]; in that work, the authors fix a single perturbation of a map that leaves two sets invariant and investigate properties of the invariant density and the eigenvector for the second- largest eigenvalue for the perturbed map, in terms of the two invariant densities for the unperturbed map. They find that the eigenvector corresponding to the second-largest eigenvalue is asymptotically a scalar multiple of the difference of the two invariant densities for the unperturbed map, which indicates a coherent structure related to the transfer of mass between the two parts of the space. In [10], Dolgopyat and Wright take a similar situation but analyze the restrictions of the map to parts of the space, where the “leaking” of mass is seen as holes in the system. Looking at these open systems, the largest eigenvalues have a particular form related directly to the sizes of the mass transfer/leaking (which are framed as transition probabilities of a related Markov chain). In our case, we are interested in generalizing these ideas to the non- autonomous setting, to see how random mass transfer impacts the value of the second- largest Lyapunov exponent for the cocycle of Perron-Frobenius operators (instead of just a single map). In the setting of these cocycles of paired tent maps, we are able to show that the hypotheses of Theorem A are true, taking the Banach space to be (L∞ equivalence classes of) bounded variation functions and finding a suitable cone that is preserved by all of the associated Perron-Frobenius operators. Thus we obtain an equivariant density for the cocycle, and an upper bound for the second-largest Lyapunov exponent in terms of 1 and 2 and quantities related to both the map and the cone. Next, we study the response of the system upon scaling 1 and 2 by some parameter κ and taking κ to

0, which simulates shrinking a perturbation of the map T0,0 back towards the original map. In this way, we can see how the second-largest Lyapunov exponent behaves under perturbations; one might hope that it shrinks as a nice function of κ (linear, say) until at κ = 0 the top Oseledets space becomes two-dimensional (spanned by the two invariant densities of T0,0) and the zero Lyapunov exponent obtains multiplicity two. Our results are outlined in the following theorems, with precise statements to

9 follow in the body of the paper.

Theorem B. Let (Ω, B, µ, σ) be an ergodic, invertible, probability-preserving transfor- mation, and let 1, 2 :Ω → [0, 1] be measurable functions which are both not µ-a.e. equal to 0 and which both have countable range. Let Tω = T1(ω),2(ω) be defined as above. Then there exists a readily computed number C such that

λ2 ≤ C < 0 = λ1, where λ1 and λ2 are the largest and second-largest Lyapunov exponents for the cocycle (n) of Perron-Frobenius operators associated to Tω .

Theorem C. Let (Ω, µ, σ), 1, and 2 be as in Theorem B. Let κ ∈ (0, 1], and consider the cocycle of maps T (n) . Then there exists c > 0 such that for sufficiently κ1(ω),κ2(ω) small κ, the second-largest Lyapunov exponent λ2(κ) for the cocycle of Perron-Frobenius operators satisfies

λ2(κ) ≤ −cκ.

Theorem D. The estimate in Theorem C is sharp, in the following sense. Set 1 = ∞ 2 = 1 for all ω. Then there is a sequence (κn)n=1 ⊆ (0, 1/2) such that κn → 0, each

Tκn,κn is Markov, and λ2(κn) is asymptotically equivalent to −2κn.

We emphasize that these results apply to an entire parameterized family of maps, and thus they give a general statement on the asymptotic properties of the second- largest Lyapunov exponent for these maps; to the best of our knowledge, this is the

first time λ2 has been upper-bounded for a family of maps, with an asymptotic estimate on the order of the bound in the scaling parameter. Note also that Theorems B and

C are consequences of Theorem A applied to different quantities kP ,GP ,DP . The primary work done, outside of showing that the hypotheses of Theorem A are satisfied, is to obtain expressions for each of those quantities. Theorem D is shown by direct computation with a specific class of paired tent maps. In the process of applying Theorem A to the cocycle of Perron-Frobenius operators associated to the paired tent maps, we happen to require a new Lasota-Yorke-type inequality for Perron-Frobenius operators acting on bounded variation functions. Its utility comes from being sufficiently strong to force small coefficients of the variation terms, but balanced in such a way as to provide uniform bounds on both terms over a family of maps, not just one map individually. The inequality is based on Rychlik’s work [48]; we prove the inequality in a similar level of generality, to provide a tool for future work. For details, see Chapter 3. The remainder of the dissertation is as follows. In Chapter 2, we give some required background on cones, measurability, and Lyapunov exponents before stating and prov- ing our cocycle Perron-Frobenius theorem. In Chapter 3, we briefly set up, state, and

10 prove a new balanced Lasota-Yorke-type inequality. In Chapter 4, we use that new Lasota-Yorke inequality to apply our cocycle Perron-Frobenius theorem to cocycles of paired tent maps as described above, to prove the aforementioned bound in Theorem B on the second-largest Lyapunov exponents for the Perron-Frobenius operators, and then find the perturbation estimate in Theorem C. In Chapter 5, we prove Theorem D by studying in-depth a specific class of Markov paired tent maps that turn out to be very amenable to analysis via standard finite-dimensional linear algebra techniques, and allow for explicit computation of the second-largest Lyapunov exponents (through eigenvalues). Theorem D is a step towards trying to answer a related but possibly harder question: what is a lower bound for the second-largest Lyapunov exponent? Finally, there is an appendix containing miscellaneous technical results that are used in various places, and the collection of references at the end. The majority of this work is contained in two submitted papers; preprints can be found at [23, 24].

11 Chapter 2 Cocycle Perron-Frobenius Theorem

As mentioned in the Introduction, we can obtain detailed information about invariant densities, ergodicity, and mixing properties of a dynamical system by looking at the spectral properties of the associated transfer operators. In the finite-dimensional dis- crete case of a Markov chain, the operator is simply the transition probability matrix, and we can utilize the classical Perron-Frobenius theorem, proved first by Perron in 1907 for positive matrices [42] and extended by Frobenius to primitive non-negative matrices in 1912 [15]. Let the of a matrix A be given by ρ(A). The classical Perron- Frobenius theorem states the following (among other things). Given a primitive non- d negative d-by-d matrix P , there exist v, w ∈ (R>0) and λ ∈ [0, ρ(P )) such that the following statements are true:

• the spectral radius ρ(P ) is a simple eigenvalue for P , with eigenvector v;

• there is no other eigenvalue of P with modulus ρ(P );

• for any x ∈ d, we have ρ(P )−nP nx −→ (w · x)v; and R n→∞ • if w · x = 0, then kP nxk ≤ Cλn for some C depending on x.

The quantity λ is the modulus of the second-largest eigenvalue(s). When applied to Markov chains, the theorem yields a unique stationary distribution in v (because ρ(P ) is equal to 1), convergence of any initial probability state to that stationary distribution, and the exponential rate of that convergence (λ, as can be seen by writing the vector x−(w·x)v in terms of (generalized) eigenvectors corresponding to the smaller eigenvalues). If λ < ρ(P ), then we say that P has a spectral gap; the spectral gap is the primary driver of the exponential rate of convergence. We wish to extend the classical Perron-Frobenius theorem as above to a theorem for cocycles of operators on a Banach space that has similar conclusions. The station- ary distribution will be an equivariant vector, and the spectral gap would indicate a

12 decay rate for all vectors that do not have a component in the direction of the equiv- ariant vector. To do this, we must find an appropriate substitution for the primitivity condition. As motivation for what the new primitivity condition should be, observe that if x is a vector with non-negative entries, then a non-negative matrix P preserves that property: P x also has non-negative entries. Said another way, the matrix P preserves d the non-negative orthant R≥0. This fact is a very specific case of the much more general principle of positive operators, which is general is described using cones. We will give background on positivity and a related quantity called the projective metric. After describing measurable and topological considerations, as well as recapping facts about cocycles and Lyapunov exponents, we will state and prove our generalization of the Perron-Frobenius theorem, Theorem 2.4.3. Afterwards, we apply the theorem to a number of easy examples to demonstrate its strength and limitations.

2.1 Cones

The generalization of positivity from R to general vector spaces is through cones. We will define the specific type of cones we need on Banach spaces and provide relevant examples to our applications. From the definition of our cones we obtain a partial order and a projective pseudo-metric on the cones, both of which we relate to the norm on the Banach space. The relationship between the cone, the pseudo-metric, and the norm provides sufficient structure to understand the implications of bounded linear operators that preserve the cone.

Cone Properties and Examples We begin with the definitions and immediate results. Following that, we provide exam- ples that will be used elsewhere. References for some of this material include Schaefer’s classic text on topological vector spaces [49] and Peressini’s book on ordered topological vector spaces [41].

Definition 2.1.1. Let (X, k·k) be a real Banach space. A cone is a set C ⊆ X that is closed under scalar multiplication by positive numbers, i.e. λC ⊆ C for all λ > 0. We define a nice cone on (X, k·k) to be a cone C ⊆ X that has the following properties:

•C is convex (equivalently, closed under addition);

•C is blunt, i.e. 0 ∈/ C (as opposed to pointed, where 0 ∈ C);

•C is salient, i.e. C ∩ (−C) = ∅ (more generally ⊆ {0});

•C∪{ 0} is closed (we will call C a “closed” cone even if 0 is not in C);

13 •C is generating (or generates X), i.e. C − C = X;

•C is D-adapted (to k·k), i.e. there exists D ∈ R≥1 such that for x ∈ X and y ∈ C, if y ± x ∈ C ∪ {0}, then kxk ≤ D kyk. If x ∈ C, we say that x is a positive element of X. The terminology is mostly self-explanatory, with the exception of “salient”: this word can refer to something pointing outward, roughly speaking. A cone that contains no vectors also in its negative (that is, a cone that contains no one-dimensional sub- space) can be seen as identifying an outward direction, so the term salient is not as strange as it first appears. From the cone, we obtain a partial order on the Banach space; this order justifies calling elements of the cone “positive”. Alternatively, from any partial order on the Banach space we can take the positive elements to form a salient convex cone, but we will not need this perspective. Lemma 2.1.2. Let X be a real Banach space, and C a salient convex cone. Then C induces a partial order on X, denoted C (or  when the choice of C is clear), by x C y when y − x ∈ C ∪ {0}. Moreover,  is a vector order; that is, if c > 0, x, y, z ∈ X, and x  y, then cx  cy and x + z  y + z. Finally, if C ∪ {0} is closed, then whenever xn  0 for all n and xn converges to x, we have x  0 also.

Proof. If x ∈ X, then x C x because x − x = 0 ∈ C ∪ {0}. If x C y and y C x, then  both y − x ∈ C ∪ {0} ∩ − C ∪ {0} = {0}, by salience of the cone, so x = y. If x C y and y C z, then we have

z − x = (z − y) + (y − x) ∈ C ∪ {0}, so that x C z. Suppose that c > 0, x, y, z ∈ X, and x  y. Then c(y − x) ∈ C ∪ {0} and (y + z) − (x + z) = y − x ∈ C ∪ {0}, so cx  cy and x + z  y + z. If C ∪ {0} is closed, the limit of any sequence in C ∪{0} remains in C ∪{0}, which proves the statement. Remark 2.1.3. When Lemma 2.1.2 applies, we see that the D-adapted condition can be rephrased to say that if −y C x C y, then kxk ≤ D kyk. Note that this inequality forces y ∈ C. The D-adapted condition is, therefore, a way of connecting the order and the norm. With it, if an element of the space is bounded in the order, then it is actually bounded in the norm as well; this fact will be used to great benefit later. In the literature, if there is some D ≥ 1 such that C is D-adapted in (X, k·k), then C is often called normal and there exists an equivalent norm to k·k such that C is 1-adapted with respect to that norm [49, Section V.3]. We will not use this fact, opting instead to work using the existing norm (the equivalent norm is a Minkowski functional for an appropriate saturated convex set).

14 Lemma 2.1.4. Let (X, k·k) be a real Banach space with a salient closed convex cone C. If C has non-empty interior (with respect to k·k) or if  is a lattice order (for a pair of vectors x, y ∈ X, there exists z ∈ X, denoted z = x ∨ y, such that (x + C) ∩ (y + C) = z + C), then C generates X.

For a lattice order, the quantity x ∨ y is also denoted sup(x, y), since it is the least upper bound for x and y.

Proof. Suppose that z an interior point of C. If x ∈ X and  > 0 such that z + x ∈ C, observe that 1 1 x = (z + x) − z ∈ C − C.   Hence C generates X. Now suppose that  is a lattice order for X. Define x+ = x ∨ 0 and x− = (−x) ∨ 0; we claim that x = x+−x−. Observe that for any z ∈ X, z+x∨y ∈ (x+z+C)∩(y+z+C), and any w ∈ (x+z+C)∩(y+z+C) is at least z+x∨y in ; thus z+x∨y = (x+z)∨(y+z). By this equation and the definitions of x±, we have

x + x− = (x − x) ∨ (x + 0) = x+, and hence x = x+ − x−. Thus C generates X (potentially by adding and subtracting a non-zero cone vector).

d d Example 2.1.5. Let X = R and equip X with the 1-norm k·k1. Then C = R≥0 \{0} is a nice cone called the positive orthant. The sum of two non-negative real numbers is non-negative, so C is closed under addition; by definition C is blunt and salient, and the closure of C is C ∪ {0}. To see that C is generating, observe that  is a lattice order d ± + on R and apply Lemma 2.1.4 (in particular, x are defined by (x )i = max{0, xi} and − (x )i = max{0, −xi}).

Note that x  y if and only if xi ≤ yi for each i = 1, . . . , d. To see that C is

D-adapted, suppose that −y  x  y; from this inequality, we have 0  2yi for each i, so each yi is non-negative. Then −yi ≤ xi ≤ yi for all i, or |xi| ≤ yi. Compute the norms: d d X X kxk1 = |xi| ≤ yi = kyk1 . i=1 i=1

Thus C is 1-adapted to k·k1. Example 2.1.6. Let (X, k·k) be a real Banach space, let f ∈ X∗ be a non-zero bounded linear functional, and let α ∈ (0, kfk). Then the subset Cf,α ⊆ X defined by

Cf,α = {x ∈ X : f(x) ≥ α kxk} \ {0}

15 kfk is a α -adapted blunt salient closed convex cone. To check that Cf,α satisfies the conditions, we use the properties of f as a bounded linear functional.

Suppose that x ∈ Cf,α and c > 0. We have f(cx) = cf(x) ≥ cα kxk = α kcxk, so cx ∈ Cf,α, and Cf,α is a cone. If x, y ∈ Cf,α and λ ∈ (0, 1), we have f(λx + (1 − λ)y) = λf(x) + (1 − λ)f(y) ≥ α(λ kxk + (1 − λ) kyk) ≥ α kλx + (1 − λ)yk , hence Cf,α is convex. By definition, Cf,α is blunt, as it does not contain 0. Suppose that x ∈ Cf,α ∩ −Cf,α. Then f(x) ≥ α kxk ≥ 0, but −x ∈ Cf,α, and so

0 ≥ −f(x) = f(−x) ≥ α k−xk = α kxk , which implies that x = 0, a contradiction. Thus Cf,α ∩−Cf,α is empty, i.e. Cf,α is salient.

If xn −→ x 6= 0 and each xn ∈ Cf,α, then observe that by continuity of the norm, n→∞

f(x) = lim f(xn) ≥ lim α kxnk = α kxk . n→∞ n→∞

Hence x ∈ Cf,α, and Cf,α ∪ {0} is closed. kfk To see the fact that Cf,α is α -adapted, suppose that −y  x  y for x, y ∈ X. Then f(y) ≥ α kyk ≥ 0, since y ∈ Cf,α ∪ {0}. Moreover, we have y ± x ∈ Cf,α ∪ {0}, so that

f(y) + f(x) = f(y + x) ≥ α kx + yk , f(y) − f(x) = f(y − x) ≥ α ky − xk .

We then apply the triangle inequality to obtain:

1 1 kxk ≤ ky + xk + ky − xk  ≤ f(y) + f(x) + f(y) − f(x) 2 2α f(y) kfk = ≤ kyk , α α

kfk so that Cf,α is α -adapted. Finally, suppose that there exists x ∈ Cf,α such that f(x) > α kxk. We claim that x is an interior point of Cf,α, so that Cf,α generates X by Lemma 2.1.4. Let  > 0 be less than (f(x) − α kxk)(α + kfk)−1 and let z ∈ B(0, ). Then we have

α kzk − f(z) ≤ |α kzk − f(z)| ≤ (α + kfk) kzk < f(x) − α kxk , |{z} < which then implies f(x + z) > α(kxk + kzk) ≥ α kx + zk .

Thus B(x, ) ⊆ Cf,α, and we see that Cf,α is a nice cone, known in the literature as a

16 Bishop-Phelps cone [43, 51]. Note also that in this situation, f is a strictly positive linear functional; not only is f non-negative on C, it sends every element of C to a positive number, using the fact that f(x) ≥ α kxk > 0 for all x ∈ C (as C is blunt). Strictly positive linear functionals have interesting properties regarding cones; their existence is equivalent to existence of a convex “base” for the cone (see [41]), and if a linear operator preserves a cone, then the adjoint of that operator sends strictly positive linear functionals (with respect to that cone) to strictly positive linear functionals. Moreover, positive linear functionals are automatically bounded in some cases (depending on the properties of the cone). We will not have need to use any of this going forward, because we need to compute uniform bounds on the norm of the linear functionals we handle, but it should be noted for the purposes of being aware of other abstract theory. Example 2.1.7. If X is a (real) , y ∈ X is non-zero, and α ∈ (0, kyk), then because y represents the linear functional x 7→ hx, yi, we have that

Cy,α = {x ∈ X : hx, yi ≥ α kxk} \ {0} is a nice cone in X. We make the following claim: If c > 0, then

 c kykq  (cy + {y}⊥) ∩ C = (cy + {y}⊥) ∩ B cy, kyk2 − α2 . y,α α

To see this equality, let x ∈ cy + {y}⊥, and note that hx, yi = c kyk2. Then, we have

c2 kyk4 kx − cyk2 = kxk2 − 2chx, yi + c2 kyk2 ≤ − c2 kyk2 α2 if and only if

(α kxk)2 ≤ c2 kyk4 − 2c2α2 kyk2 + 2cα2hx, yi = (c kyk2)2 = hx, yi2, which is if and only if α kxk ≤ hx, yi, because both quantities are non-negative. This equality indicates that the perpendicular plane to cy intersects Cy,α in a disk-shaped region, and so the cone looks very much like an infinite-dimensional “ice cream” cone (think of waffle cones). The closer α gets to 0, the larger the disk becomes, approaching a half-plane; the closer α gets to kyk, the smaller the disk becomes, approaching the positive ray through y. In this picture, the quantity D is something like an aperture for the cone. The same idea holds for general Cf,α, though without the disk shape, and noting that if we tried to set α = kfk, Cf,α could be empty (there exist bounded linear functions that do not achieve their norm; we avoid this case entirely in our definition).

Informally, we might call all Cf,α “ice cream” cones, because of the Hilbert space case. See Figure 2.1 for an illustration in two dimensions.

17 y 1 ⊥ 2 y + {y}

2 Figure 2.1: The cone Cy,α for y = (1, 2) and α = 2 in R , depicted with the perpendicular plane at cy with c = 1/2.

The proofs of all of the statements in the following lemma are simply appeals to the definitions of the various concepts.

Lemma 2.1.8. If C1 and C2 are two cones in a real Banach space (X, k·k), then C1 ∩C2 is also a cone; if both C1 and C2 are closed or convex or blunt, then C1 ∩ C2 is closed or convex or blunt, respectively. If at least one of C1 or C2 is salient or D-adapted (say, constants D1 and D2, potentially one of which being infinity), then C1 ∩ C2 is salient or D-adapted with D = min{D1,D2}, respectively. If x ∈ C1 ∩ C2 is an interior point to both C1 and C2, then it is an interior point to C1 ∩ C2.

Example 2.1.9. Consider the Banach space C0(X), where X is a non-compact locally compact Hausdorff topological space (for example, X = Z≥0 equipped with the discrete topology), equipped with the supremum norm. Let C ⊆ C0(X) be the cone of positive continuous functions, C = {f ∈ C0(X): f ≥ 0 and f(x) > 0 for some x}. We will show that C is a nice cone with empty interior. By definition 0 ∈/ C, and positive scalar multiples of functions in C remain in C. If f1, f2 ∈ C and λ ∈ (0, 1), then if f1(x) > 0, we see (λf1 + (1 − λ)f2)(x) > 0, so C is convex. The intersection of C and −C is empty, as any such elements of the intersection would be both non-negative and non-positive, hence zero; thus C is salient. We see that C ∪ {0} is closed: if fn converges to f in

C0(X), then f takes all non-negative values. To see that C generates C0(X), observe that  is a lattice order: if f, g ∈ X, then the pointwise max(f, g) is also continuous and is equal to f ∨ g (any function larger than both f and g is at least max(f, g)).

By Lemma 2.1.4, C generates C0(X). Finally, if −g  f  g, then for all x ∈ X,

|f(x)| ≤ g(x), and so kfk∞ ≤ kgk∞. Thus C is 1-adapted to . However, C has empty interior. Let f ∈ C, let  > 0, and find x ∈ X such that 0 ≤ f(x) < /2. Then B (f, ) is not contained in C, since we can construct a continuous k·k∞ function h ∈ B(0, ) ⊆ C0(X) with h(x) > /2 and observe that f − h ∈ B(f, ) but f(x) − h(x) < 0, hence f − h∈ / C. Note that essentially the same arguments show that the cone of positive functions 1 in l (Z≥1) is also a nice cone with empty interior.

18 Example 2.1.10. Let Lip[0, 1] be the Banach space of Lipschitz functions on [0, 1], equipped with the norm

kfkLip = max{kfk∞ , Lip(f)}, and for a > 0, let

 f(x)  C = f ∈ Lip[0, 1] : f(x) > 0 ∀x ∈ [0, 1], ≤ ea|x−y| ∀x, y ∈ [0, 1] . a f(y)

a Then Ca is a nice cone on Lip[0, 1], with D-adapted constant D = 1 + 2ae ; Liverani uses this cone in [33]. cf(x) f(x) Let f ∈ C , and let c > 0. Then = ≤ ea|x−y|, so cf ∈ C . If f, g ∈ C , a cf(y) f(y) a a then by definition f + g takes positive values, and moreover for all x, y ∈ [0, 1] we have f(x) − f(y)ea|x−y| ≤ 0 and g(y)ea|x−y| − g(x) ≥ 0. This implies

f(x) − f(y)ea|x−y| ≤ 0 ≤ g(y)ea|x−y| − g(x), and rearranging the resulting inequality yields

f(x) + g(x) ≤ ea|x−y|, f(y) + g(y) which means f + g ∈ Ca. These two conditions imply that Ca is convex. Of course,

0 ∈/ Ca, and the intersection Ca ∩ −Ca is empty.

To see that Ca generates Lip[0, 1], observe that 1 ∈ Lip[0, 1]. We will show that 1 is an interior point of Ca and apply Lemma 2.1.4. Observe that for f ∈ B(0, 1) and x 6= y ∈ [0, 1], we have:

1 + f(x) f(x) − f(y) Lip(f) |x − y| = 1 + ≤ 1 + 1 + f(y) 1 + f(y) 1 − kfk∞ kfk |x − y| kfk ! ≤ 1 + Lip ≤ exp Lip |x − y| , 1 − kfkLip 1 − kfkLip

t −1 where we use the power series expansion of e . For kfkLip ≤ a(1 + a) , we have that −1 1 1 −1 kfkLip (1 − kfkLip) ≤ a, and so + f ∈ Ca. Hence B( , a(1 + a) ) ⊆ Ca. ∼ Then, if fn ∈ Ca and fn −→ f in Lip[0, 1], then either f = 0, or f is never 0. To n→∞ see this, suppose that f(x) = 0 for some x ∈ [0, 1]. Then for any y ∈ [0, 1] we have

a|y−x| a|y−x| f(y) − f(x)e = lim fn(y) − fn(x)e ≤ 0, n→∞

a|y−x| since fn ∈ Ca for each n. So f(y) ≤ f(x)e = 0, and so f(y) = 0 also, hence f is

19 uniformly 0. In the case where f is never zero, the same limit calculation shows that

f(y) ≤ ea|y−x|, f(x) and so f ∈ Ca. Finally, suppose that f, g ∈ Lip[0, 1], with −f  g  f. Then f(x) − g(x) > 0 and g(x)+f(x) > 0 for all x ∈ [0, 1], implying that f(x) > 0 and |g(x)| < f(x). This yields kgk∞ ≤ kfk∞. Then, for x, y ∈ [0, 1], we have

f(x) − g(x) e−a|x−y| ≤ ≤ ea|x−y|, f(y) − g(y) by both sides of the order relation −f  g  f. We then subtract 1 to see that

f(x) − g(x) − 1 ∈ e−a|x−y| − 1, ea|x−y| − 1 . f(y) − g(y)

Since we also have

−a|x−y| −a|x−y| −a|x−y| a|x−y| a|x−y| e − 1 = 1 − e = e (e − 1) ≤ e − 1, we obtain f(x) − g(x) a|x−y| − 1 ≤ e − 1 f(y) − g(y) for all x, y ∈ [0, 1]. Then we have:

|g(x) − g(y)| = |g(x) − f(x) + f(x) − f(y) + f(y) − g(y)| ≤ |f(x) − f(y)| + |(g(x) − f(x)) − (g(y) − f(y))|

f(x) − g(x) ≤ Lip(f) |x − y| + |f(y) − g(y)| · − 1 f(y) − g(y) ≤ Lip(f) |x − y| + 2 |f(y)| · ea|x−y| − 1 .

By the Mean Value Theorem, if x 6= y then we have ea|x−y| − 1 = |x − y| · aeat for some t ∈ (0, |x − y|). Using the fact that this t is at most 1, we obtain:

a |g(x) − g(y)| ≤ kfkLip |x − y| + 2 kfk∞ |x − y| ae a = kfkLip |x − y| (1 + 2ae ) .

a Thus Lip(g) ≤ (1 + 2ae ) kfkLip, and so we have:

a a kgkLip = max{kgk∞ , Lip(g)} ≤ max{kfk∞ , (1 + 2ae ) kfkLip} = (1 + 2ae ) kfkLip .

20 a Thus Ca is (1 + 2ae )-adapted to k·kLip, as desired. a A different computation, one with power series, can show that Ca is (2e −1)-adapted to k·kLip; this other constant is smaller when a is larger than 1.

The Projective (Pseudo-)Metric Given a cone on a real Banach space that generates a partial order, we can define some- thing like a distance on the cone referring only to the order structure. This quantity is actually a projective pseudo-metric: it captures information about something like an angle between two vectors in the cone, so collinear vectors are distance zero, but near the boundary of the cone the distance grows very large, and the boundary components are infinitely far from other parts of the cone. Moreover, we will see shortly that if the cone is nice, then the distance is closely related to the norm; this relation will give us a way to translate results about operators and cones into analytic properties. Here, we make the requisite definitions and then show that the quantities involved have sufficiently nice properties for use going forward.

Definition 2.1.11. Let (X, k·k , C) be a real Banach space with a salient closed convex cone. For v, w ∈ C, define:

α(v, w) = sup {λ ≥ 0 : λv  w} ; β(v, w) = inf {µ ≥ 0 : w  µv} ; β(v, w) θ(v, w) = log . α(v, w)

The quantity θ(v, w) is the projective (pseudo-)metric on C, and is sometimes called a Hilbert metric on C. If f, g ∈ C have θ(f, g) < ∞, then we say that f and g are comparable.

Recall that v  w by definition means that w − v ∈ C ∪ {0}; we will use both notations interchangeably as required. We gather important properties about α and β in Proposition 2.1.12 and properties of θ in Proposition 2.1.13.

Proposition 2.1.12. Let (X, k·k , C) be a real Banach space with a salient closed convex cone.

• In the second component, α is super-additive, β is sub-additive, and both are positive-scalar-homogeneous, i.e. for all v, w, z ∈ C and c > 0:

α(v, w + z) ≥ α(v, w) + α(v, z), β(v, w + z) ≤ β(v, w) + β(v, z), α(v, cw) = cα(v, w), β(v, cw) = cβ(v, w).

21 • The quantities α and β are increasing in the second component: if v, w, z ∈ C and w  z, then

α(v, w) ≤ α(v, z), β(v, w) ≤ β(v, z).

• The quantities α and β have a symmetry property: for all v, w ∈ C, α(v, w) = β(w, v)−1.

• If v, w ∈ C are comparable, then α(v, w) and α(w, v) > 0, and β(v, w) and β(w, v) < ∞.

• For all v, w ∈ C, we have α(v, w) ≤ β(v, w).

• For all v ∈ C, α(v, v) = 1 = β(v, v). If C is, moreover, D-adapted, then for all v, w ∈ C we have

kwk 1 kvk α(v, w) ≤ D , β(v, w) ≥ . kvk D kwk

Finally, if L : X → X is a linear operator such that LC ⊆ C, then for all v, w ∈ C:

α(L(v),L(w)) ≥ α(v, w), β(L(v),L(w)) ≤ β(v, w).

Proof. Note that C ∪ {0} is closed. This fact implies that w − α(v, w)v ∈ C ∪ {0}, because w − λv ∈ C ∪ {0} for all λ < α(v, w) and w − α(v, w)v is a limit point of these w − λv. We can thus write:

w + z − α(v, w) + α(v, z)v = w − α(v, w)v + z − α(v, z)v ∈ C ∪ {0}, so that α(v, w + z) ≥ α(v, w) + α(v, z). The analogous proof holds for β(v, w)v − w ∈ C ∪ {0} and the β sub-additivity. If c > 0, then we have

α(v, cw) = sup {λ > 0 : λv  cw} λ λ  = c · sup > 0 : v  w = cα(v, w), c c and similarly for β. Suppose that v, w, z ∈ C and w  z. Then

z − α(v, w)v = z − w + w − α(v, w)v ∈ C ∪ {0}, so that α(v, z) ≥ α(v, w). Similarly,

β(v, z)v − w = β(v, z)v − z + z − w ∈ C ∪ {0},

22 so that β(v, w) ≤ β(v, z). 1  Let c > 0; by factoring, we observe that w−cv = c c w − v , so that w−cv ∈ C∪{0} if and only if c−1w − v ∈ C ∪ {0}. If this case never holds, then α(v, w) = 0 and β(w, v) = ∞; otherwise, taking a supremum in c is the same as taking an infimum in c−1, so that α(v, w) = β(w, v)−1. β(v,w) θ(v,w) Suppose that v and w are comparable. Then α(v,w) = e is finite and non-zero, thus both α(v, w) and β(v, w) are finite and non-zero, and by the previous part of the lemma both α(w, v) and β(w, v) are finite and non-zero as well. Observe that we have, for any v, w ∈ C with β(v, w) finite:

(β(v, w) − α(v, w))v = β(v, w)v − w + w − α(v, w)v ∈ C ∪ {0}, by additivity and the computation earlier. Since C is salient, we must have β(v, w) − α(v, w) ≥ 0, or α(v, w) ≤ β(v, w). If β(v, w) is infinite, then clearly α(v, w) ≤ β(v, w). If v ∈ C, then (1 − c)v ∈ C ∪ {0} if and only if c ≤ 1 and (c − 1)v ∈ C ∪ {0} if and only if c ≥ 1, because C is salient. These equivalences show that α(v, v) = 1 = β(v, v). Assume now that C is D-adapted. Let v, w ∈ C and suppose that w − λv be an element of C ∪ {0} for λ > 0; then we have

−w  λv  w, and by the D-adapted property we see that kλvk ≤ D kwk, which rearranges to λ ≤ kwk kwk D kvk . Taking a supremum yields α(v, w) ≤ D kvk . The analogous proof shows that −1 kwk β(v, w) ≥ D kvk . Suppose that L : X → X is a linear operator such that LC ⊆ C By linearity, we have that L also preserves C ∪ {0}, and so for all v, w ∈ C,

L(w) − α(v, w)L(v) = L(w − α(v, w)v) ∈ C ∪ {0}.

Thus α(L(v),L(w)) ≥ α(v, w), and similarly β(L(v),L(w)) ≤ β(L(v),L(w)). Proposition 2.1.13. Let (X, k·k , C) be a real Banach space with a salient closed convex cone. Then θ is a projective pseudo-metric; that is, for all v, w, z ∈ C: • θ(v, w) ≥ 0 (but is sometimes infinite);

• θ(v, w) = 0 if and only if v and w are collinear, i.e. w = cv for some c > 0;

• θ(v, w) = θ(w, v);

• θ(v, w) ≤ θ(v, z) + θ(z, w). If L ∈ B(X) is a linear operator such that LC ⊆ C, then for all v, w ∈ C:

1  θ(L(v),L(w)) ≤ tanh 4 diamθ (LC) θ(v, w),

23 where if the θ-diameter of LC is infinite then the scale factor is 1.

Proof. The majority of the properties of θ follow directly from properties of α and β in Proposition 2.1.12. Let v, w, z ∈ C. Since β(v, w) ≥ α(v, w), we have θ(v, w) = log (β(v, w)α(v, w)−1) ≥ log(1) = 0. On the other hand, since C ∪ {0} is closed and α is a supremum, we know that w − α(v, w)v ∈ C ∪ {0} and

w − α(v, w)v − v = w − (α(v, w) + )v∈ / C ∪ {0} for any  > 0, hence α(v, w − α(v, w)v) = 0 and θ(v, w − α(v, w)v) is infinite. If w = cv for some c > 0, then α(v, w) = cα(v, v) = c = cβ(v, v) = β(v, w), and so β(v,w) 0 θ(v, w) = log(1) = 0. Conversely, if θ(v, w) = 0, then we have α(v,w) = e = 1, and so β(v, w) = α(v, w) = c is finite and non-zero. Then we compute

cv = α(v, w)v  w  β(v, w)v = cv, so because  is a partial order, we have that w = cv. For v, w ∈ C, we have:

β(w, v) β(v, w) θ(w, v) = log = log = θ(v, w), α(w, v) α(v, w) using the symmetry property of α and β (where θ(v, w) = ∞ exactly when θ(w, v) = ∞). To see that the triangle inequality holds, suppose that θ(v, z) + θ(z, w) is finite (if not, then we are done). Then we see:

β(v, z) β(z, w) θ(v, z) + θ(z, w) = log + log α(v, z) α(z, w)  w  z }| { β(v, β(z, w)z) β(v, w)   = log   ≥ log = θ(v, w), α(v, α(z, w)z) α(v, w) | {z } w using the monotonicity properties of α, β, and log. For the contraction property, we follow [33, Section 1]. We have

β(L(v),L(w)) β(v, w) θ(L(v),L(w)) = log ≤ log = θ(v, w) α(L(v),L(w)) α(v, w) for all v, w ∈ C, so if diamθ(LC) is infinite we treat tanh(∞) = 1 and see that the inequality holds. Suppose now that ∆ = diamθ(LC) is finite. If v and w are collinear, then L(v) and L(w) are also collinear, so the inequality holds (both quantities are

24 zero). Otherwise, write a = α(v, w), b = β(v, w),

λ = α(L(w − α(v, w)v),L(β(v, w)v − w)), µ = β(L(w − α(v, w)v),L(β(v, w)v − w)); by the diameter assumption, λ and µ are finite and non-zero. By definition we obtain

λL(w − av)  L(bv − w)  µL(w − av).

These two inequalities rearrange to (λ + 1)L(w)  (b + λa)L(v) and (µ + 1)L(w)  (b + µa)L(v), which imply

(b + µa) (b + λa) α(L(v),L(w)) ≥ , β(L(v),L(w)) ≤ . µ + 1 λ + 1

Writing down the definition of θ(L(v),L(w)) yields:

β(L(v),L(w)) θ(L(v),L(w)) = log α(L(v),L(w)) (b + λa) µ + 1  ≤ log · λ + 1 (b + µa) b/a + λ 1 + λ = log − log b/a + µ 1 + µ eθ(v,w) + λ 1 + λ = log − log . eθ(v,w) + µ 1 + µ

The derivative of log ((et + λ)(et + µ)−1) with respect to t is, after a line of computa- (µ − λ)et tion, . Then, finally, we obtain (see the inequality involving tanh in (et + λ)(et + µ) Lemma A.2.2)

Z θ(v,w) (µ − λ)et θ(L(v),L(w)) ≤ t t dt 0 (e + λ)(e + µ) 1 µ ∆ ≤ tanh log θ(v, w) ≤ tanh θ(v, w). 4 λ 4

Lemma 2.1.14. Let (X, k·k , C) be a real Banach space with a salient closed convex cone. Considering C × C as a subset of X × X with the norm k(x, y)k1 = kxk + kyk and equipping C ×C with the restriction of the Borel σ-algebra on X ×X, we see that α is upper-semi-continuous, β and θ are lower-semi-continuous, and all three are Borel measurable, into ([0, ∞], σ(τ|·|)).

Proof. Let (vn, wn) −→ (v, w) in C ×C (in norm). Let λ < lim supn→∞ α(vn, wn). Find n→∞ a subsequence (vnk , wnk ) such that α(vnk , wnk ) > λ. Then w−λv = limk→∞ wnk −λvnk ∈

25 C ∪ {0}, since by definition of α, wnk − λvnk ∈ C ∪ {0} for each k, and C ∪ {0} is closed. Thus α(v, w) ≥ λ, for all λ < lim supn→∞ α(vn, wn), and so α(v, w) ≥ lim supn→∞ α(vn, wn). This is upper-semi-continuity. Similarly, if µ > lim infn→∞ β(vn, wn), then the same argument (for some subse- quence (vnk , wnk )) shows that β(v, w) ≤ lim infn→∞ β(vn, wn), which is lower-semi- continuity. Finally, to see that θ is lower-semi-continuous, observe that θ = log(β/α) = log(β)− log(α). Since log is increasing and continuous, log(β) is lower-semi-continuous and log(α) is upper-semi-continuous. Then, − log(α) is lower-semi-continuous, and so θ is lower-semi-continuous, as the sum of two lower-semi-continuous functions. Borel measurability follows because upper-semi-continuous functions pull back the set [c, ∞] to closed sets for all c ∈ [0, ∞], and lower-semi-continuous functions pull back [0, c] to closed sets for all c ∈ [0, ∞].

We now take the time to explicitly demonstrate how this setup plays out in the finite-dimensional case, with the standard non-negative cone.

d Example 2.1.15. In Example 2.1.5, we saw that C = R≥0 \{0} is a nice cone. We can explicitly compute α, β, and θ in this situation, which is fairly atypical; in most cases, one can only obtain bounds on these quantities for any two elements of the cone (as we will do for a more complicated cone in Chapter 4). We cannot obtain an exact diameter for the image of the cone under a contracting linear operator, but we can obtain a very useful bound.

Observe that for v, w ∈ C, v  w if and only if vi ≤ wi for all i = 1, . . . , d. We may then easily compute   wi α(v, w) = sup {λ > 0 : λvi ≤ wi for all i} = min : vi, wi not both 0 , vi where we take the minimum over only those indices i where vi and wi are not both zero (since the defining inequality would be vacuously satisfied by any λ), and we treat division by 0 as indicating ∞ (because any λ would work in that case). Similarly, for β we compute   wi β(v, w) = inf {µ > 0 : µvi ≥ wi for all i} = max : vi, wi not both 0 , vi with the same caveats as for α. Therefore we have n o  wi  max : vi, wi not both 0 vi θ(v, w) = log  n o  . wi min : vi, wi not both 0 vi

26 Suppose that P ∈ Md(R≥0) is a non-negative matrix. We have that P C ⊆ C if and only if no column of P is the zero vector. In this case, we can determine exactly when P C has finite diameter, and compute an upper bound for the diameter ∆ in terms of the size of the entries of P . We claim the following: P C has finite diameter exactly when the only zero entries in P occur in rows of zeros, and that in this case ∆ ≤ 2(log(B) − log(A)), where 0 < A ≤ B < ∞ and all of the non-zero entries of P are in the interval [A, B]. If P C has finite diameter, then α(P v, P w) > 0 for all v, w ∈ C. In particular, setting v = ej we have (P ej)i > 0 except when (P ej)i = 0 for all j (so that (P w)i = 0 for all w), which says that zeroes in P only occur in rows of zeroes. Conversely, if P has zeroes only in rows of zeroes, suppose that the non-zero entries of P satisfy

0 < A ≤ Pij ≤ B < ∞. For any v, w ∈ C, we have:   (P w)i α(P v, P w) = min :(P v)i, (P w)i not both 0 (P v)i (Pd ) Pijwj = min j=1 : non-zero rows i of P Pd j=1 Pijvj ( Pd ) A wj A kwk ≥ min j=1 : non-zero rows i of P = 1 . Pd Bkvk B j=1 vj 1

Bkwk A similar calculation yields β(P v, P w) ≤ 1 , and so combining these two in- Akvk1 equalities we find that θ(P v, P w) ≤ 2 log(B/A) = 2(log(B) − log(A)), hence ∆ ≤ 2(log(B) − log(A)).

Analytic Properties of Banach Spaces with Nice Cones Given a real Banach space with a nice cone, we gain strong tools for studying the space, thanks to the relationship between the norm and the cone. We will give three important results for later use; one is an extension theorem, one is a characterization of what it means for a cone to generate the Banach space, and the last is the key link between the projective metric θ and the norm on the space. By a positive function on a Banach space with a salient convex cone, we mean a function that takes non-negative values on the cone. The next lemma describes how to extend well-behaved positive functions on a cone to the subspace spanned by the cone.

Lemma 2.1.16. Let C ⊆ X be a salient convex cone in a real Banach space, and let η : C → R>0 be a positive, positive-scalar-homogeneous, additive function. Then η extends uniquely to a positive linear functional on the vector subspace C − C ⊆ X, by setting η(g1 − g2) = η(g1) − η(g2).

Proof. We need to show that the extension of η is well-defined. Let g1 − g2 = h1 − h2,

27 for g1, g2, h1, h2 ∈ C. Then g1 + h2 = h1 + g2 ∈ C, and so by additivity we have

η(g1) + η(h2) = η(g1 + h2) = η(g2 + h1) = η(g2) + η(h1), and rearranging gives us η(g1) − η(g2) = η(h1) − η(h2), as desired. Additivity of the extension follows from additivity of η. We have η(cg) = cη(g) for non-negative c trivially; for negative c, we write

η(c(g1 − g2)) = η((−c)g2 − (−c)g1) = η((−c)g2) − η((−c)g1)

= (−c)η(g2) − (−c)η(g1)

= c(η(g1) − η(g2)) = cη(g1 − g2).

Hence the extension of η is linear. Positivity of the extension follows directly from positivity of η. Uniqueness of the extension is also straightforward: if f also extends η, then

f(g1 − g2) = f(g1) − f(g2) = η(g1) − η(g2) for any g1, g2 ∈ C.

The following lemma, by Andˆo[1, Lemma 1], indicates that when a cone generates the Banach space, every vector has a bounded (not necessarily continuous or even mea- surable) decomposition into cone vectors. Even if the decomposition is not necessarily measurable, it turns out not to matter as much as the boundedness property.

Lemma 2.1.17 (Andˆo). Let X be a Banach space and let C be a salient closed convex cone in X. Then the following are equivalent:

1. X = C − C, i.e. C generates X;

2. there exists K > 0 such that for all x ∈ X, there are x+, x− ∈ C such that x = x+ − x− and kx±k ≤ K kxk.

The next Proposition provides the key relations between θ and k·k on C. The first part is an inequality that will be used repeatedly, and the second is the generalization of the same statement but for D = 1, found in [33] as Lemma 1.3.

Proposition 2.1.18. Let (X, k·k , C) be a real Banach space with a nice D-adapted cone.

1. If f, g ∈ C are comparable, then   α(g, f) + β(g, f) D θ(f,g)  f − g ≤ kgk α(g, f) e − 1 . 2 2

28 2. If f, g ∈ C with kfk = kgk = r, then kf − gk ≤ D2r eθ(f,g) − 1 .

Thus if (fn)n ⊆ C is a θ-Cauchy sequence of elements with the same norm, then (fn)n is Cauchy in norm, hence convergent.

1 Proof. First, suppose that f, g ∈ C are comparable. Subtract 2 (α(g, f) + β(g, f))g from the inequality α(g, f)g  f  β(g, f)g to get

β(g, f) − α(g, f) α(g, f) + β(g, f) β(g, f) − α(g, f) − g  f − g  g. 2 2 2 Use the D-adapted property to obtain:   α(g, f) + β(g, f) β(g, f) − α(g, f) f − g ≤ D g 2 2 D = kgk α(g, f) eθ(g,f) − 1 . 2 Now, suppose that f, g ∈ C with kfk = kgk = r. If f and g are not comparable, then the norm bound in the Proposition is trivial, since the right-hand-side is infinite. Thus, assume that f and g are comparable. Using the reverse triangle inequality, we have:   α(g, f) + β(g, f) α(g, f) + β(g, f) r · 1 − = kfk − g 2 2   α(g, f) + β(g, f) ≤ f − g . 2

We apply the first part of the proposition, the upper bound for α in Proposition 2.1.12, and the triangle inequality to obtain the desired inequality:     α(g, f) + β(g, f) α(g, f) + β(g, f) kf − gk ≤ f − g + g − g 2 2   α(g, f) + β(g, f) α(g, f) + β(g, f) = f − g + r · 1 − 2 2   α(g, f) + β(g, f) ≤ 2 f − g 2 ≤ D kgk α(g, f) eθ(g,f) − 1 ≤ D2r eθ(g,f) − 1 .

From this inequality, it is clear that if (fn)n is a sequence of cone elements that is Cauchy in θ, then it is Cauchy in norm.

29 2.2 Measurability and Topological Considerations

Because we will be considering functions from a measure space to spaces which may be equipped with multiple topologies, we must decide on the topologies and σ-algebras on the relevant spaces.

• For a Banach space (X, k·k), we use the norm topology and the associated Borel σ-algebra.

• For the dual of a Banach space, X∗, the two main options for topologies are the

weak-* topology τw∗ and the norm topology, where the norm is denoted by k·kX∗ if clarity is required.

• For the bounded linear operators on a Banach space, B(X), we will use either

the strong operator topology τSOT or the norm topology, k·kop (depending on whether or not the space X is separable).

In general, we will write σ(τ) to denote the Borel σ-algebra generated by a topology τ. Observe that the weak-* topology and the strong operator topology are both topologies of pointwise convergence. We will need to consider measurability and continuity of maps from a probability space (with and without a topology) into the of X. Because we have a cone in X that generates X, we can obtain these properties by looking at a space of functions on the cone that contains X∗ and has an appropriate norm. This space is, essentially, the bounded functions on the intersection of the unit sphere of X with the cone.

Definition 2.2.1. Let (X, k·k , C) be a real Banach space with a nice cone. The set of norm-bounded, positive-scalar-homogeneous functions f : C → R is denoted by BPSH(C), and precisely given by

BPSH(C) := {f : C → : f(cx) = cf(x) for all x ∈ C, c > 0; kfk < ∞} R RC where kfk = sup {|f(x)| : x ∈ C, kxk ≤ 1}. RC Lemma 2.2.2. Let (X, k·k , C) be a real Banach space with a nice cone. Then the set (BPSH(C), k·k ) is a normed vector space that contains (X∗, k·k ). If φ ∈ X∗, then RC X∗

kφk ≤ kφk ≤ 2K kφk , RC X∗ RC where K is the constant from Andˆo’sLemma. The norm topology on X∗ is the same as the restriction of the norm topology on BPSH(C).

30 Proof. It is a simple computation to show that BPSH(C) is a vector space and that k·k is a semi-norm. The fact that k·k is non-degenerate follows from the positive- RC RC scalar-homogeneity of the elements of BPSH(C). If φ ∈ X∗, then it is clear that kφk ≤ kφk . To see the other inequality, Andˆo’s RC X∗ Lemma says that there is some K ≥ 1 such that for all x ∈ X, there exist g1, g2 ∈ C such that x = g1 − g2 and kgik ≤ K kxk. By the triangle inequality, for kxk ≤ 1 we obtain:

|φ(x)| ≤ |φ(g1)| + |φ(g2)| = K φ(K−1g ) + K φ(K−1g ) ≤ 2K kφk , 1 2 RC so that kφk ≤ 2K kφk . On X∗, these two norms are equivalent, and so they X∗ RC generate the same topology. The next lemma is a standard measurability result; we give a generalized version of the lemma and proof in Appendix A.2, using Lemma A.2.5 (and the fact that µ is complete, so that redefining f on a set of measure zero does not affect its measurability).

Lemma 2.2.3. Let fn : (Ω, B, µ) → (X, k·k) be a sequence of Borel measurable func- tions from a complete probability space into the Banach space (X, k·k) and suppose that fn converges pointwise µ-a.e. to f in the norm. Then f is also Borel measurable. We will be using two measurability hypotheses for our main theorem: strong mea- surability in the case where our Banach space (X, k·k) is separable and µ-continuity in the case where (X, k·k) is not.

Definition 2.2.4. Let (Ω, B, µ) be a probability space and let (X, k·kX ), (Y, k·kY ) be normed linear spaces. A family of bounded linear operators L :Ω → B(X,Y ) is strongly measurable when it is measurable with respect to the Borel σ-algebra generated by the strong operator topology on B(X,Y ) (that is, pointwise convergence). The next lemma is a characterization of strongly measurable functions into the bounded linear operators between two separable Banach spaces; see Lemma A.4 in [19], where the proof carries over seamlessly for general Y . We will be applying this fact in two main cases: Y = R, so that B(X,Y ) = X∗, and Y = X, so that B(X,X) = B(X).

Lemma 2.2.5. Let (Ω, B, µ) be a probability space, and let (X, k·kX ), (Y, k·kY ) be normed linear spaces. When (X, k·kX ) and (Y, k·kY ) are separable, then L is strongly measurable if and only if Lω(x) is Borel measurable for all x ∈ X. Definition 2.2.6. Let (Ω, B, µ) be a Borel probability space where Ω is a Borel subset of a Polish space (a separable ), let (Y, τ) be a topological space, and let f :Ω → Y be a function. We say that f is µ-continuous when there exists an S increasing sequence of compact sets Kn ⊆ Ω such that µ ( n Kn) = 1 and on each Kn, f is continuous.

31 For Borel probability spaces over a Borel subset of a Polish space, this definition is S equivalent to requesting that there are only measurable sets An such that µ ( n An) = 1 and on which f is continuous, because µ is tight (measurable sets can be approximated from inside by compact sets) and Ω is normal. This weaker condition is usually taken to be the definition of µ-continuity for more general spaces; see [50]. The following lemma is a reworking of Lemma A.2.5. The proof is essentially the same as for Egoroff’s theorem.

Lemma 2.2.7. Let (Ω, B, µ) be a Borel probability space over a Borel subset of a Polish space, let (X, d) be a metric space, and let fn :Ω → X be a sequence of µ-continuous functions. Suppose that fn converges pointwise to f :Ω → X. Then f is also µ- continuous.

Proof. The two key ideas of the proof are that the pointwise convergence of fn to f is almost uniform, and that this uniform convergence can be made to happen on an increasing sequence of compact sets.

The µ-continuity of each fn implies the measurability of each fn and thus the measurability of f, because (X, d) is a metric space. Thus, for each n ≥ 1 and δ > 0, the set

Gn,δ = {ω : d(fn(ω), f(ω)) < δ} is measurable. Moreover, for each δ, limn→∞ µ(Gn,δ) = 1.

−i Let  ∈ (0, 1). Fine an increasing sequence of indices ni such that µ(Gni,2 ) > −(i+1) 1 −i 1 − 2 , and set Ai = Gni,2 . By µ-continuity of each fn, find Ki ⊆ Ω compact 1 −(i+1) such that fni 1 is continuous and µ(Ki ) > 1 − 2 . By tightness of the measure Ki 2 2 −(i+1) 1 2 µ, find Ki ⊆ Ai compact such that µ(Ki ) > 1 − 2 . Set Ki = Ki ∩ Ki ; then Ki is compact, and moreover:    µ(K ) = 1 − µ(Ω \ K1 ∪ Ω \ K2) ≥ 1 − − = 1 − . i i i 2i+1 2i+1 2i T∞ Then, set K = i=1 Ki; K is compact, as a closed subset of the compact set K1, and µ(K) ≥ 1 − . 1 We observe that each fni is continuous on Ki ⊆ Ki , and so is continuous on

K. Moreover, on K, f is a uniform limit of the subsequence of functions fni , since −i d(fni (ω), f(ω)) < 2 for all ω ∈ K. Hence f is the uniform limit of continuous functions on K, therefore continuous on K. Our choice of  was arbitrary, so that K can be made a large as possible; hence f is µ-continuous.

Lastly, we need the notion of tempered functions (see [19, Appendix C]). The name comes from tempered distributions, which have subexponential growth.

32 x σ(x) σ2(x) σ3(x) σ−3(x) σ−2(x) σ−1(x) x

f(3, x) f(3, σ−3(x))

Figure 2.2: Schematics of cocycles along an orbit of x (both non-invertible and invertible).

Definition 2.2.8. Let (Ω, B, µ, σ) be a invertible probability-preserving transforma- tion, and let f :Ω → R. We say that f is tempered when 1 lim log |f(σn(ω))| = 0. n→±∞ n

2.3 Cocycles, Lyapunov Exponents, and the Grassmannian

In order to describe random dynamical systems, we use cocycles; in order to apply the same definition to both compositions of maps and compositions of linear operators, we use a fairly general setting. Note that the word “cocycle” does not seem to be universally well-defined across mathematics (there are cocycles in operator algebras that do not seem to match the way we will use the word).

Definition 2.3.1. Let X be a set and let σ : X → X be a function; let M be a monoid. A map f : Z≥0 × X → M is called a cocycle (over σ) when f(0, x) is the identity in m the monoid and f satisfies f(n + m, x) = f(n, σ (x))f(m, x) for all n, m ∈ Z≥0 and x ∈ X.

(n) As alternative notation, we will write f(n, x) = fx . We often call (X, σ) the base for the cocycle. Observe that if f is a cocycle, then f(n, x) = f(1, σn−1(x)) ··· f(1, x), so f is determined by the values of f(1, ·); conversely, defining a function f(n, x) using the values of f(1, ·) as above defines a cocycle. We can view a cocycle as accumulating the values of f(1, ·) along the orbit of an element x; see Figure 2.2. If Mx ⊆ M is a family of subsets such that f(1, x)Mx ⊆ Mσ(x), then we call Mx equivariant.

Example 2.3.2. If (Ω, µ, σ) is a probability-preserving transformation and for each (0) (n) ω, Tω is a map on some space X, then by setting Tω = idX and Tω := Tσn−1(ω) ◦ (n) · · · ◦ Tω, we have that Tω defines a cocycle (the monoid is maps on the space X, with composition). The choice of map Tω can be seen as random, so this cocycle is how we formalize the notion of random dynamical systems. If moreover, the maps Tω yield (0) (n) Perron-Frobenius operators Lω, then we can define Lω = I and Lω = Lσn−1(ω) ··· Lω

33 (n) to obtain a cocycle of operators Lω ; the monoid is the bounded linear operators with composition.

Because it makes no sense to consider eigenvalues for general cocycles of bounded operators on Banach spaces (a single eigenvalue can only say something about a finite iterate of the cocycle, not the whole cocycle asymptotically), we need a new quantity that describes asymptotic growth of vectors. The replacement notion is that of Lya- punov exponents. Note that we restrict our notion of Lyapunov exponent to cocycles of bounded linear operators; there are ways to consider Lyapunov exponents of maps or dynamical systems without looking at the operators, but we will not consider those. See [17] for a reference. While we will only be considering cocycles over invertible ergodic probability-preserving transformations, we give a general definition.

Definition 2.3.3. Let (X, k·k) be a Banach space, and let L : Z≥0 × Ω → B(X) be a cocycle of bounded linear operators on X over σ. We define the Lyapunov exponent of L in the direction v ∈ X over ω ∈ Ω to be 1 λ(ω, v) := lim sup log kL(n, ω)vk . n→∞ n

The collection of all Lyapunov exponents for L at ω is called the Lyapunov spectrum and 1 is denoted Λ(ω) := {λ(ω, v): v ∈ X}. We also denote λ(ω) := lim sup n log kL(n, ω)k. n→∞

Lemma 2.3.4. Let (X, k·k) be a Banach space, and let L : Z≥0 × Ω → B(X) be a cocycle of bounded linear operators on X over σ. The Lyapunov exponents for L satisfy the following properties, for all ω ∈ Ω.

• λ(ω, 0) = −∞.

• λ(ω, αv) = λ(ω, v) for all v ∈ X and α 6= 0.

• λ(ω, v + w) ≤ max{λ(ω, v), λ(ω, w)} for all v, w ∈ X.

• λ(σ(ω),L(1, ω)v) = λ(ω, v) for all v ∈ X.

We also have λ(ω) = sup Λ(ω). When (Ω, µ, σ) is an ergodic probability-preserving transformation, λ(ω) = λ∗ is µ-almost everywhere constant.

Some of the main results in random dynamical systems theory are a variety of theorems all called multiplicative ergodic theorems. They are the cocycle-equivalent of finding generalized eigenspaces for a matrix; we state a version of the MET following [17] and [19], where the only difference between the two versions of the theorems in the papers is the measurability assumptions on the cocycle and its base. We defer the

34 relevant measurability definitions to Section 2.4, and the precise definition of quasi- compact to [19]. This version of the theorem is often called semi-invertible, because while the operators are not required to be invertible, the base system still is, and we still obtain an equivariant decomposition of the space instead of simply a decreasing flag of subspaces (see the cited papers for details). Theorem 2.3.5. Let (Ω, B, µ, σ, X, k·k ,L) be either a strongly measurable random dy- namical system or a µ-continuous random dynamical system, such that log+ kL(1, ·)k ∈ L1(µ). Suppose that the random dynamical system is quasi-compact, meaning that the cocycle analogue of the essential spectral radius, κ∗ (defined similarly to λ∗ but with the index of compactness instead of the norm) is strictly less than λ∗. Then there exists ∗ a countable collection of Lyapunov exponents λ1 > λ2 > ··· > κ (may be finite or infinite) and a unique measurable equivariant decomposition of X into closed subspaces L 1 X = V (ω) ⊕ i Yi(ω), where the Yi are finite-dimensional, limn→∞ n log kL(n, ω)xk = 1 ∗ λi for all non-zero x ∈ Yi(ω), and limn→∞ n log kL(n, ω)xk ≤ κ for x ∈ V (ω). As per the Introduction, the subspaces are called Oseledets spaces. We are breaking down X into subspaces in which non-zero vectors grow exactly at the rates given by the associated Lyapunov exponents. In this context, we can reinterpret Theorem A as saying that cocycles that preserve and sometimes contract a cone are quasi-compact, and that we have a lower bound on the difference λ1 − λ2. Note that we have not defined what it means for (a family of) subspaces to be measurable. To make that definition, we define the Grassmannian of a Banach space. For reference, see Appendix B of [19] and Appendix B of [50]. Definition 2.3.6. Let (X, k·k) be a Banach space. The Grassmannian of X, denoted G(X), is the set of all closed complemented subspaces of X, equipped with the metric d(V,W ) = dH (V ∩ B,W ∩ B), where dH is the Hausdorff distance and B is the closed unit ball in X. For k ∈ Z≥0, the set of closed k-dimensional subspaces of X is denoted k Gk(X) and the set of closed k-codimensional subspaces of X is denoted G (X); both are closed subsets of G(X). We equip G(X) with the Borel σ-algebra arising from the metric topology. We will refer almost everything about the Grassmannian to the above cited papers, except for the following lemma: Lemma 2.3.7. Let (X, k·k) be a Banach space and let S denote the unit sphere of X.

Then the map spanR : S → G(X) sending v to spanR{v} is continuous.

Proof. Let Bv = spanR{v}∩B and Bw = spanR{w}∩B. Observe that for α, β ∈ [−1, 1], we have

dist(αv, Bw) ≤ kαv − αwk ≤ |α| kv − wk ,

dist(βw, Bv) ≤ kβw − βvk ≤ |β| kv − wk .

35 Then we compute, by definition of the Hausdorff distance:

d(spanR{v}, spanR{w}) = dH (Bv,Bw) ( )

= max sup dist(αv, Bw), sup dist(βw, Bv) α∈[−1,1] β∈[−1,1] ( ) ≤ max sup |α| kv − wk , sup |β| kv − wk = kv − wk . α∈[−1,1] β∈[−1,1]

Hence, spanR is continuous.

2.4 Statement of the Main Theorem

The main theorem is actually two theorems with the same conclusion outside of mea- surability but with drastically different measurability assumptions. Due to this fact, we will describe the two sets of measurability hypotheses first, and then state the theorem.

Definition 2.4.1. Let (Ω, B, µ, σ) be an ergodic, invertible, probability-preserving transformation with µ a complete measure, and let (X, k·k) be a real separable Banach space, with bounded linear operators B(X), and L :Ω → B(X) a strongly measurable map. Then the cocycle L(n, ω) = L(1, σn−1(ω)) ··· L(1, ω) is called a strongly measur- able cocycle over (Ω, B, µ, σ). We will call the tuple (Ω, B, µ, σ, X, k·k ,L) a strongly measurable random dynamical system.

Definition 2.4.2. Let (Ω, τ) be homeomorphic to a Borel subset of a Polish space, µ a complete Borel probability measure on (Ω, σ(τ)), and σ :Ω → Ω a homeomorphism so that (Ω, µ, σ) is a ergodic, invertible, probability-preserving transformation. Let (X, k·k) be a real Banach space (not necessarily separable), and let L :Ω → B(X) be µ-continuous with respect to the norm topology on B(X). Then the cocycle L(n, ω) = L(1, σn−1(ω)) ··· L(1, ω) is called a µ-continuous cocycle over (Ω, σ(τ), µ, σ). We will call the tuple (Ω, B, µ, σ, X, k·k ,L) a µ-continuous random dynamical system.

Theorem 2.4.3. Let (Ω, B, µ, σ, X, k·k ,L) be either a strongly measurable random dy- namical system or a µ-continuous random dynamical system, such that log+ kL(1, ·)k ∈ L1(µ). Let C ⊆ X be a nice cone such that L(1, ω)C ⊆ C for all ω. Suppose that there exists a positive measure subset GP of Ω, a positive integer kP , and a positive real  number DP such that for all ω ∈ GP , diamθ L(kP , ω)C ≤ DP . Then there exists a σ-invariant set of full measure Ω˜ ⊆ Ω on which the following statements are true:

1. There exists a unique function v : Ω˜ → C, and a positive measurable function ˜ φ : Ω → R>0 such that •k v(ω)k = 1,

36 • log+(φ) ∈ L1(µ), and • L(1, ω)v(ω) = φ(ω)v(σ(ω)).

2. There exists a family of bounded linear functionals ω 7→ η(ω, ·): Ω˜ → X∗ such that

• η(ω, ·) is strictly positive with respect to C, • η(σ(ω),L(1, ω)x) = φ(ω)η(ω, x) for all x ∈ X and ω ∈ Ω˜, and • η(ω, v(ω)) = 1.

Thus X = spanR{v(ω)} ⊕ ker(η(ω, ·)) is an equivariant decomposition of X with respect to L(n, ω) and the map πω : X → X given by πω(x) = η(ω, x)v(ω) is the

continuous projection onto spanR{v(ω)}. 3. If (Ω, B, µ, σ, X, k·k ,L) is strongly measurable, then:

• v is B-σ(k·k) measurable, • η(ω, ·) is strongly measurable,

• the projection operators πω, I − πω are strongly measurable, and

• spanR{v(ω)} is B-σ(d) measurable. If (Ω, B, µ, σ, X, k·k ,L) is µ-continuous, then:

• v is µ-continuous with respect to k·k = k·kX ,

• η(ω, ·) is µ-continuous with respect to k·kX∗ ,

• the projection operators πω and I −πω are µ-continuous with respect to k·kop, and

• spanR{v(ω)} and ker(η(ω, ·)) are µ-continuous with respect to the Grass- mannian distance.

4. The Lyapunov exponent for v(ω) is

n−1 1 1 X lim log kL(n, ω)v(ω)k = lim log(φ(σi(ω))) n→∞ n n→∞ n i=0 1 = λ∗ = lim log kL(n, ω)k , n→∞ n

37 where λ∗ is the maximal Lyapunov exponent for L(n, ω) (possibly −∞), and we have

n−1 ! 1 1 X i lim sup log L(n, ω) ker(η(ω,·)) − log(φ(σ (ω))) n→∞ n n i=0  −1! µ(GP ) 1 ≤ − log tanh DP < 0. kP 4

Z If λ∗ > −∞, then λ∗ has multiplicity 1 with λ∗ = log φ dµ, and the projection Ω operators πω and I − πω are norm-tempered in ω. Remark 2.4.4. Theorem 2.4.3 will be proved without any reference to any version of the Multiplicative Ergodic Theorem (for example, Theorem 2.3.5). It therefore provides a way to show quasi-compactness of certain dynamical systems and verify the hypotheses of the MET in order to use it. Moreover, in this case, the “top” or “fast” space, the equivariant space along which all non-zero vectors grow at the fastest rate ∗ λ1 = λ , is the one-dimensional span of v(ω), and the “slow” space, the equivariant space along which all vectors grow at a rate slower than λ∗, is the kernel of η(ω, ·). In addition, we state a corollary that provides a sufficient condition for the existence of the set GP and quantities kP and DP in the hypotheses of Theorem 2.4.3. This con- dition is a generalization of the primitivity condition in the classical Perron-Frobenius theorem, where a power of a matrix with non-negative entries has all positive entries. Instead, we require that over a positive measure set of ω, L(n, ω) eventually strictly contracts a cone. Corollary 2.4.5. Let (Ω, B, µ, σ, X, k·k ,L) be either a strongly measurable random dynamical system or a µ-continuous random dynamical system. Let C ⊆ X be a nice cone such that L(1, ω)C ⊆ C for all ω. Suppose that

nP (ω) = inf {k ≥ 1 : diamθ(L(k, ω)C) < ∞} is finite on a set of positive measure. Then nP is finite µ-almost everywhere, and there exists a positive measure subset GP of Ω, a positive integer kP , and a positive real number DP such that for all ω ∈ GP ,  diamθ L(kP , ω)C ≤ DP .

Thus, if in addition log+ kL(1, ·)k ∈ L1(µ), then Theorem 2.4.3 applies. As mentioned in the Introduction, there is an assortment of theorems similar to Theorem 2.4.3, all with different assumptions in terms of the measurability or invert- ibility or compactness of the cocycle of operators, or the extent of the cone contraction.

38 The primitivity condition in Corollary 2.4.5 seems to be the direct generalization of the classical primitivity condition to the cocycle case, and we use the cone contraction formulation in place of the “all positive entries” requirement to extend from matri- ces to bounded linear operators on a Banach space. The requirement in the theorem for explicit constants is what supplies the power to upper bound growth rates, but the corollary is sufficient to show that a cocycle is quasi-compact and that an MET applies.

2.5 Proof of Theorem 2.4.3 and Corollary 2.4.5

A key ingredient of many proofs of Perron-Frobenius-type theorems is contraction of a cone or a family of cones. The first lemma shows that the θ-diameter of the image of the cone under an iterate of the cocycle is a measurable function. The following proposition gives a quantitative estimate on the contraction of the cone in terms of its θ-diameter along both forwards and backwards orbits of σ. The lemma afterwards establishes a minimum rate of contraction. The first proof relies on the hypotheses placed on the random dynamical system. The latter two proofs are entirely combinatorial and ergodic theoretic, in the sense that we only use the ergodic properties of σ and the algebraic and order-theoretic properties of the cone.

Lemma 2.5.1. Let (Ω, B, µ, σ, X, k·k ,L) be either a strongly measurable random dy- namical system or a µ-continuous random dynamical system. Then for all k ≥ 1, ω 7→ diamθ(L(k, ω)C) is a measurable function into R≥0 ∪ {∞}. Proof. Fix k ≥ 1. First assume that we are in the strongly measurable case (so that X ∞ is separable). Observe that C is separable, with some countable dense subset {xn}n=1. For any M > 1, we have that \ {ω ∈ Ω : diamθ(L(k, ω)C) ≤ M} = {ω ∈ Ω: θ(L(k, ω)xi,L(k, ω)xj) ≤ M} , i,j≥1 because L(k, ω) is a continuous map on X and θ is lower-semi-continuous on C×C. Since the map ω 7→ L(k, ω) is strongly measurable, we see that ω 7→ θ(L(k, ω)xi,L(k, ω)xj) is measurable for each i, j. Thus ω 7→ diamθ(L(k, ω)C) is measurable. Next, assume that we are in the µ-continuous case. Find a sequence of disjoint compact sets Kn ⊆ Ω on which L(k, ω) is continuous. Consider the set of operators P ⊆ B(X) which preserve C, equipped with the subspace norm topology, and define

D : P → [0, ∞] by D(L) = diamθ(LC). We claim that D is lower-semi-continuous.

To see this, suppose that Ln −→ L in P, let M = lim infn→∞ D(Ln), and find a n→∞ subsequence Lnk such that M = limk→∞ D(Lnk ). By the lower-semi-continuity of θ,

39 for any x, y ∈ C we have:

θ(Lx, Ly) ≤ lim inf θ(Lnx, Lny) n→∞

≤ lim inf θ(Ln x, Ln y) k→∞ k k

≤ lim inf D(Ln ) = M. k→∞ k

Taking a supremum over all x and y yields lower-semi-continuity of D. Finally, we have that diamθ(L(k, ω)C) = D(L(k, ω)) is the composition of a continuous function and a lower-semi-continuous function, which is lower-semi-continuous and thus measurable on each compact Kn, therefore measurable on Ω. Proposition 2.5.2. Let (Ω, B, µ, σ, X, k·k ,L) be either a strongly measurable random dynamical system or a µ-continuous random dynamical system, such that log+ kL(1, ·)k is in L1(µ). Let C ⊆ X be a nice cone such that L(1, ω)C ⊆ C for all ω. Suppose that there exists a positive measure subset GP of Ω, a positive integer kP , and a positive real number DP such that for all ω ∈ GP ,  diamθ L(kP , ω)C ≤ DP .

Then there exists a σ-invariant set of full measure Ω˜ ⊆ Ω and measurable functions ± ˜ j : Ω × Z≥0 → Z≥0 that are non-decreasing and tend to infinity in n for all fixed ω, ˜ ± such that for any ω ∈ Ω, and n ≥ kP + min {n ≥ 0 : j (ω, n) ≥ 1}, we have:

+ D j (ω,n)−1 diam (L(n, ω)C) ≤ tanh P · D , θ 4 P − D j (ω,n)−1 diam (L(n, σ−n(ω))C) ≤ tanh P · D . θ 4 P

−1 Proof. By Poincar´eRecurrence applied to both σ and σ , µ-almost every point in GP returns infinitely often to GP both forward and backward in time; call the set of these ˜ S∞ −n ˜ points G, and let Ω = n=−∞ σ (G). We have that Ω is σ-invariant, with measure 1 by ergodicity of (µ, σ). If ω ∈ Ω,˜ then ω is in the orbit of a point in G, which means that σn(ω) enters + GP infinitely often in both the forward and backward directions. Let {nl (ω)}l≥1 n+(ω) be the sequence of non-negative indices such that σ l (ω) ∈ GP , and similarly we − − −nl (ω) let {nl (ω)}l≥1 be the sequence of positive indices such that σ (ω) ∈ GP . For notational purposes, we set

+  + + −  − − l (ω, n) = nl (ω): nl (ω) ≤ n , l (ω, n) = nl (ω): nl (ω) ≤ n to denote the numbers of these indices.

40 n n When σ (ω) ∈ GP , we know that L(kP , σ (ω)) contracts the cone C by a minimum amount, by the assumption on GP , kP , and DP . We want to count the number of times this event happens, going both forwards and backwards. We therefore define two new + − + − sequences {mj (ω)}j≥1 and {mj (ω)}j≥1 by taking subsequences of nl and nl where consecutive terms are at least kP apart. Specifically, we set

+ + m1 (ω) = n1 (ω), +  + + + mj (ω) = min nl (ω): nl (ω) ≥ mj−1(ω) + kP (j > 1); −  − − m1 (ω) = min nl (ω): nl (ω) ≥ kP , −  − − − mj (ω) = min nl (ω): nl (ω) ≥ mj−1(ω) + kP (j > 1).

In the forward direction, we count the number of cone contractions in n steps by + counting the number of terms of mj that are at most n − kP , to allow for the kP steps afterwards. In the backward direction, we count the number of cone contractions in n − steps by simply counting the number of mk terms that are at most n, because we have − already accounted for the kP steps in the definition of m1 . In notation:

+  + + −  − + j (ω, n) = mj (ω): mj (ω) ≤ n − kP , j (ω, n) = mj (ω): mj (ω) ≤ n .

It is straightforward to see that for fixed ω ∈ Ω,˜ j±(ω, n) is non-decreasing in n and tends to infinity as n grows arbitrarily large. + By the definition of mj (ω), we have that

 m+(ω)  diam L(kP , σ j (ω))C ≤ DP .

m+(ω) We then let Cj = L(kP , σ j (ω)). Then, using the cocycle property we can write, + for n ≥ m1 + kP ,

L(n, ω) = Bj+(ω,n)+1Cj+(ω,n)Bj+(ω,n) ··· C2B2C1B1, where the B terms do not necessarily contract distances in C but still preserve them (using Proposition 2.1.13). We now repeatedly utilize Proposition 2.1.13, to see that

41 for any v, w ∈ C:

θ(L(n, ω)v, L(n, ω)w) = θ Bj+(ω,n)+1Cj+(ω,n)Bj+(ω,n) ··· C2B2C1B1v,  Bj+(ω,n)+1Cj+(ω,n)Bj+(ω,n) ··· C2B2C1B1w + D j (ω,n)−1 ≤ tanh P θ (C B v, C B w) 4 1 1 1 1 + D j (ω,n)−1 ≤ tanh P · D . 4 P

DP  The multiple powers of tanh 4 arise from each block Ci in the product, since that block contracts the C to θ-diameter at most DP . Taking a supremum over all v, w ∈ C yields the statement for the forward direction. The proof for the backward direction − is completely analogous, instead using the backward direction indices mj and the counting function j−.

Lemma 2.5.3. For ω ∈ Ω˜ and l± as defined in the proof of Proposition 2.5.2, we have:

l+(ω, n) l−(ω, n) + 1 j+(ω, n) + 1 ≥ , j−(ω, n) + 1 ≥ . kP kP

+ Proof. By the definition of mj (ω), we have:

j+(ω,n)+1  + + [ + + + nl (ω): nl (ω) ≤ n ⊆ {mj (ω), mj (ω) + 1, ··· , mj (ω) + kP − 1}. j=1

The first inequality follows by taking cardinalities. − By the definition of mj (ω), we have

j−(ω,n)  − − [ − − − ni (ω): kP ≤ ni ≤ n ⊆ {mj (ω), mj (ω) + 1, ··· , mj (ω) + kP − 1}, j=1

− and l (ω, n)−kP +1 is at most the cardinality of the smaller set. The second inequality follows by taking inequalities and rearranging.

By Birkhoff’s Theorem, we know that n−1l+(ω, n) and n−1l−(ω, n) both converge

µ-almost everywhere to µ(G) = µ(GP ). This fact provides the exponential rate of contraction of the cone by L(n, ω) in n, in both directions. We now give the proof of the theorem. Parts 1, 2, and 4 will be proven in order, and statements in part 3 will be proven throughout as appropriate.

Proof of Theorem 2.4.3(1). We construct v(ω) ∈ X by constructing a Cauchy sequence and proving it converges to something with the correct properties; we use similar ideas

42 ˜ to those found in [2, 17, 19]. Given ω ∈ Ω, choose some g ∈ C and define vn(ω) ∈ X for n ≥ 0 by L(n, σ−n(ω))g v (ω) = . n kL(n, σ−n(ω))gk

Each vn(ω) has unit norm. By Proposition 2.5.2 and the scale-invariance of θ, we see that for n ≥ m:

−m −n θ(vm(ω), vn(ω)) = θ(L(m, σ (ω))g, L(n, σ (ω))g) = θ(L(m, σ−m(ω))g, L(m, σ−m(ω))L(n − m, σ−(n−m)(ω))g) − D j (ω,m)−1 ≤ tanh P · D . 4 P

− Since j tends to infinity in n and θ is symmetric, we see that {vn(ω)}n is a Cauchy sequence in θ, and hence Cauchy in norm by Proposition 2.1.18. Let v(ω) be the limit of this sequence. Then kv(ω)k = 1 and v(ω) ∈ C. Moreover, observe that L(n, σ−n(ω))C is a decreasing chain of sets with θ-diameter decreasing to zero. Since θ is lower-semi-continuous, we see that

 k·k −n −n diamθ L(n, σ (ω))C = diamθ(L(n, σ (ω))C).

k·k T∞ −n Thus we see that n=0 L(n, σ (ω))C is a norm-closed set with θ-diameter zero; this −n set contains v(ω), as vn(ω) ∈ L(n, σ (ω))C for each n, and so it is the non-negative ray containing v(ω). If we had used a different initial vector g0 and obtained the vector v0(ω), we would have v0(ω) collinear with v(ω) and having the same norm, which shows they are equal. Hence v(ω) is independent of the initial choice of g. Suppose that (Ω, B, µ, σ, X, k·k ,L) is strongly measurable. By the results in Ap- pendix A in [19], we see that L(n, σ−n(ω)) is strongly measurable as the composition of the strongly measurable functions L(1, σ−i(ω)), which implies that L(n, σ−n(ω))g is measurable with respect to the norm on X. By measurability of the norm, we see that each vn(ω) is measurable, and so the limit function is also measurable, as (X, k·k) is a metric space. Now, suppose that (Ω, B, µ, σ, X, k·k ,L) is µ-continuous. By continuity of the Ba- nach algebra multiplication on B(X) and the operator norm, we see that L(n, σ−n(ω)), L(n, σ−n(ω))g, and kL(n, σ−n(ω))k are also µ-continuous. In addition, kL(n, σ−n(ω))k is bounded away from zero on compact sets where it is continuous, so that each vn(ω) is µ-continuous, and thus by Lemma 2.2.7, v(ω) is µ-continuous.

43 To see that v(ω) is equivariant, we have (since L(1, ω) is continuous): ! \ k·k L(1, ω)v(ω) ∈ L(1, ω) L(n, σ−n(ω))C n=0 \  k·k ⊆ L(1, ω) L(n, σ−n(ω))C n=0 \ k·k ⊆ L(n + 1, σ−n+1(σ(ω)))C n=0

This last set has θ-diameter equal to 0 and contains v(σ(ω)), as shown above. Thus we see that L(1, ω)v(ω) = kL(1, ω)v(ω)k v(σ(ω)); set φ(ω) = kL(1, ω)v(ω)k, so that L(1, ω)v(ω) = φ(ω)v(ω). It is clear that φ(ω) ≤ kL(1, ω)k and that φ(ω) is measurable (in either set of hypotheses), so log+ φ ∈ L1(µ).

Proof of Theorem 2.4.3(2). For ω ∈ Ω,˜ let L˜(1, ω) = φ(ω)−1L(1, ω). Then L˜(1, ω) preserves C and satisfies L˜(1, ω)v(ω) = v(σ(ω)). We will construct, for each ω, a linear functional on X. So fix ω, and for any g ∈ C and n ≥ 0, we apply Proposition 2.1.12 to see that

α(v(σn(ω)), L˜(n, ω)g) ≤ α(v(σn+1(ω)), L˜(n + 1, ω)g) ≤ β(v(σn+1(ω)), L˜(n + 1, ω)g) ≤ β(v(σn(ω)), L˜(n, ω)g).

By Proposition 2.5.2, we know that there is some N = N(ω) such that diamθ(L(N, ω)C) is finite, which implies that

0 < α(v(σN (ω)), L˜(N, ω)g) ≤ β(v(σN (ω)), L˜(N, ω)g) < ∞, and so by Proposition 2.1.12 the two sequences are monotonic and bounded, thus convergent. Moreover, for n ≥ N we have

n β(v(σ (ω)), L˜(n, ω)g) ˜ ˜ 1 ≤ = eθ(L(n,ω)v(ω),L(n,ω)g) ≤ ediamθ(L(n,ω)C), α(v(σn(ω)), L˜(n, ω)g) and Proposition 2.5.2 shows that the right side of the inequality converges to 1. Let η(ω, g) be the shared limit of α(v(σn(ω)), L˜(n, ω)g) and β(v(σn(ω)), L˜(n, ω)g). We now show that η(ω, ·) extends to a bounded linear functional on X. By Proposition 2.1.12 and linearity of L˜(n, ω), α(v(σn(ω)), L˜(n, ω)g) is positive-scalar- homogeneous in g, so that η(ω, g) is also, and η(ω, g) is positive because it is larger than each of the terms α(v(σn(ω)), L˜(n, ω)g). We also see that η(ω, ·) is additive on C,

44 by using super-additivity of α and sub-additivity of β and taking limits. In particular, for any n ≥ 0 and g1, g2 ∈ C:

n ˜ n ˜ α(v(σ (ω)), L(n, ω)g1) + α(v(σ (ω)), L(n, ω)g2) n ˜ ≤ α(v(σ (ω)), L(n, ω)(g1 + g2)

≤ η(ω, g1 + g2) n ˜ ≤ β(v(σ (ω)), L(n, ω)(g1 + g2)) n ˜ n ˜ ≤ β(v(σ (ω)), L(n, ω)g1) + β(v(σ (ω)), L(n, ω)g2).

Taking limits in n yields η(ω, g1 + g2) = η(ω, g1) + η(ω, g2). We then appeal to Lemma 2.1.16 to extend η(ω, ·) uniquely to a linear functional on X.

To see that η(ω, ·) is bounded, let g ∈ C, and let N be such that diamθ(L(N, ω)C) is finite. We have:

η(ω, g) ≤ β(v(σN (ω)), L˜(N, ω)g)   ˜ N L(N, ω)g ˜ = β v(σ (ω)),  · L(N, ω)g ˜ L(N, ω)g   L˜(N, ω)g ˜ N diamθ(L(N,ω)C) ˜ ≤ α v(σ (ω)),  e L(N, ω) kgk ˜ L(N, ω)g ˜ diamθ(L(N,ω)C) ˜ ≤ De L(N, ω) kgk .

By Lemma 2.2.2, we can bound the norm of η by its norm in BPSH(C), to obtain:

˜ kη k ≤ 2K kη k ≤ 2KDediamθ(L(N,ω)C) L˜(N, ω) . ω X∗ ω RC

Thus η(ω, ·) is bounded. Directly by the definition, we have

η(ω, v(ω)) = lim α(v(σn(ω)), L˜(n, ω)v(ω)) = lim α(v(σn(ω)), v(σn(ω))) = 1. n→∞ n→∞

For g ∈ C, we have

η(σ(ω),L(1, ω)g) = φ(ω) lim α(v(σn(σ(ω))), L˜(n, σ(ω))L˜(1, ω)g) n→∞ = φ(ω) lim α(v(σn+1(ω)), L˜(n + 1, ω)g) n→∞ = φ(ω)η(ω, g), and by linearity this equality extends to g ∈ X. We have already seen that η(ω, ·) is

45 strictly positive on C.

Equivariance of the decomposition X = spanR{v(ω)}⊕ker(η(ω, ·)) follows from the equivariance properties of v(ω) and η(ω, ·). It is clear to see that πω(·) = η(ω, ·)v(ω) is the projection with range spanR{v(ω)} and kernel ker(η(ω, ·)), and it is continuous because η(ω, ·) is.

We now prove two technical lemmas that allow us to prove the two remaining parts of the theorem.

Lemma 2.5.4. In the setting of Theorem 2.4.3, if ω ∈ Ω˜ then there exists N = N(ω) such that for all g ∈ C and n ≥ N,

L˜(n, ω)g  β(v(σN (ω)), L˜(N, ω)g)v(σn(ω)).

N ˜ Proof. Find N(ω) such that diamθ(L(N, ω)C) is finite. Then β(v(σ (ω)), L(N, ω)g) is finite and L˜(N, ω)g  β(v(σN (ω)), L˜(N, ω)g)v(σN (ω)). Apply L˜(n − N, σN (ω)) to both sides to obtain the conclusion of the lemma.

Lemma 2.5.5. In the setting of Theorem 2.4.3, if ω ∈ Ω˜ then there exists N = N(ω) ≥

1 and Cω > 0 such that for all g ∈ C and n ≥ N, diamθ(L(n, ω)C) is finite and

˜ n diamθ(L(n,ω)C)  L(n, ω)g − η(ω, g)v(σ (ω)) ≤ Cω kgk e − 1 .

(D+1)D2 One such constant is C = ediamθ(L(N,ω)C) L˜(N, ω) . ω 2

Proof. Let ω ∈ Ω,˜ and let g ∈ C. Let a(n, ω) = α(v(σn(ω)), L˜(n, ω)g), b(n, ω) = β(v(σn(ω)), L˜(n, ω)g). Since a(n, ω) ≤ η(ω, g) ≤ b(n, ω) for all n, the distance from η(ω, g) to the midpoint of [a(n, ω), b(n, ω)] is at most half the length of the interval. In

˜ addition, by Proposition 2.1.12 we have a(n, ω) ≤ D L(n, ω)g . By the first part of

46 Proposition 2.1.18, we have:

kL˜(n, ω)g − η(ω, g)v(σn(ω))k   1 n ≤ L˜(n, ω)g − (a(n, ω) + b(n, ω)) v(σ (ω)) 2   1 n n + (a(n, ω) + b(n, ω)) v(σ (ω)) − η(ω, g)v(σ (ω)) 2

D  θ(v(σn(ω)),L˜(n,ω)g)  a(n, ω) + b(n, ω) ≤ a(n, ω) e − 1 + − η(ω, g) 2 2

D + 1  n  ≤ a(n, ω) eθ(v(σ (ω)),L˜(n,ω)g) − 1 2 (D + 1)D ≤ L˜(n, ω)g ediamθ(L(n,ω)C) − 1 . 2

˜ By Lemma 2.5.4 and the D-adapted condition, we have that L(n, ω)g is at most D · β(v(σN (ω)), L˜(N, ω)g). By definition, β = αeθ on C × C, and so we get

˜ N ˜ diamθ(L(N,ω)C) diamθ(L(N,ω)C) ˜ L(n, ω)g ≤ α(v(σ (ω)), L(N, ω))e ≤ De L(N, ω) kgk .

The proof is complete upon substituting this upper bound into the inequality for

˜ n L(n, ω)g − η(ω, g)v(σ (ω)) . We now prove part 3 of the main theorem. The primary difficulty lies in the µ- continuous case, because the definition of η(ω, ·) is in terms of α and β terms, and these two functions are not continuous on the entirety of C × C, only on int(C) × C. Instead, we prove that η(ω, ·) is a well-behaved limit of a much nicer function when restricted to C. The strongly measurable case is much simpler.

Proof of Theorem 2.4.3(3). First, assume that (Ω, B, µ, σ, X, k·k ,L) is strongly mea- surable. We have already seen that v(ω) is measurable into (X, k·k). To see that η(ω, ·) is strongly measurable (measurable with respect to the weak* σ-algebra), let g ∈ C. Then the map ω 7→ α(v(σn(ω)), L˜(n, ω)g) is measurable into (R, |·|), by Propo- sition 2.1.12 and the strong measurability of L. Then ω 7→ η(ω, g) is the pointwise limit of these functions, taking values in a metric space, and hence measurable. If x ∈ X, then write x = g1 − g2 for some g1, g2 ∈ C, so that η(ω, x) = η(ω, g1) − η(ω, g2); observe that η(ω, x) is the difference of two measurable functions, hence measurable. Thus

η(ω, ·) is strongly measurable. The operators πω and I − πω are strongly measurable (that is, measurable with respect to the strong operator σ-algebra) because η(ω, ·) is strongly measurable. The map from S(0, 1) ⊆ X to the Grassmannian G(X) given by v 7→ spanR{v} is continuous (Lemma 2.3.7), so that ω 7→ spanR{v(ω)} is measurable. Next, assume that (Ω, B, µ, σ, X, k·k ,L) is µ-continuous. By Lemma 2.2.7, we see

47 that v(ω) is µ-continuous, since each vn(ω) is µ-continuous. As just mentioned, taking the span of v(ω) is continuous, so that spanR{v(ω)} is µ-continuous as well. To see that ω 7→ η(ω, ·) is µ-continuous, we use the machinery developed to relate functions on the cone to linear functionals. Since L(1, ω) is µ-continuous, for fixed n we see that L(n, ω) is continuous on compact subsets of Ω with arbitrarily large measure. Applying operators to norm-bounded vectors is a norm-continuous operation on B(X), and so because L(n, ω)v(ω) is never zero, we see that

L(n, ω) ω 7→ L˜(n, ω) = kL(n, ω)v(ω)k is a µ-continuous map into (B(X), k·k). Suppose that L˜(n, ω) is continuous on some ˜ large compact K ⊆ Ω. For ω1, ω2 ∈ K and g ∈ C with kgk ≤ 1, we then have:

˜ ˜ ˜ ˜ ˜ ˜ L(n, ω1)g − L(n, ω2)g ≤ L(n, ω1)g − L(n, ω2)g ≤ L(n, ω1) − L(n, ω2) , which can be made as small as desired by taking ω arbitrarily close to ω . Thus 2 1 ω 7→ L˜(n, ω)· is a µ-continuous map into (BPSH(C), k·k ). RC Then, by the reverse triangle inequality and Lemma 2.5.5, we see that for ω ∈ Ω,˜ there exists N = N(ω) such that for all g ∈ C with kgk ≤ 1 and n ≥ N, we have:

˜ ˜ n L(n, ω)g − η(ω, g) ≤ L(n, ω)g − η(ω, g)v(σ (ω)) (D + 1)D2 ≤ ediamθ(L(N,ω)C) L˜(N, ω) kgk ediamθ(L(n,ω)C) − 1 2 (D + 1)D2 ≤ ediamθ(L(N,ω)C) L˜(N, ω) ediamθ(L(n,ω)C) − 1 . 2

As n tends to infinity, ediamθ(L(n,ω)C) − 1 tends to 0 by Proposition 2.5.2. Therefore

L˜(n, ω)· converges pointwise in ω to η(ω, ·) in (BPSH(C), k·k ). By Lemma 2.2.7, RC ω 7→ η(ω, ·) is µ-continuous into the space (BPSH(C), k·k ); since (X∗, k·k ) is a RC C∗ topological subspace of (BPSH(C), k·k ) by Lemma 2.2.2, we see that ω 7→ η(ω, ·) is RC µ-continuous into X∗.

The operators πω and I − πω are µ-continuous because η(ω, ·) is. By Proposition

B.3.2 in [50], the subspaces spanR{v(ω)} = ker(I − πω) and ker(η(ω, ·)) = ker(πω) are both µ-continuous with respect to the Grassmannian distance, since the kernel map is norm-continuous on projections.

Proof of Theorem 2.4.3(4). First, we prove that for ω ∈ Ω,˜ v(ω) has the largest Lya- punov exponent of any vector in X for L(n, ω). Let x ∈ X \{0}, and by Andˆo’sLemma

(2.1.17), find g1, g2 ∈ C such that x = g1 − g2 and kgik ≤ K kxk. By Lemma 2.5.4 and multiplying by kL(n, ω)v(ω)k, there exists N = N(ω) such that for all n ≥ N and

48 i = 1, 2,

N ˜ n L(n, ω)gi  β(v(σ (ω)), L(N, ω)gi) kL(n, ω)v(ω)k v(σ (ω)) N ˜ = β(v(σ (ω)), L(N, ω)gi)L(n, ω)v(ω).

By the D-adapted condition, we see that

N ˜ kL(n, ω)gik ≤ D · β(v(σ (ω)), L(N, ω)gi) kL(n, ω)v(ω)k .

We may then bound above the Lyapunov exponent for x over ω:

1 1 lim sup log kL(n, ω)xk ≤ lim sup log (kL(n, ω)g1k + kL(n, ω)g2k) n→∞ n n→∞ n    1 N ˜ N ˜ ≤ lim sup log β(v(σ (ω)), L(N, ω)g1) + β(v(σ (ω)), L(N, ω)g2) n→∞ n 1  + log kL(n, ω)v(ω)k n 1 = lim sup log kL(n, ω)v(ω)k . n→∞ n

We now show that L˜(n, ω) restricted to the kernel of η(ω, ·) has an exponential growth rate strictly less than 0. By Lemma 2.5.5, for any ω ∈ Ω˜ there exists N = N(ω) such that for all n ≥ N and g ∈ C, we have

˜ n L(n, ω)g − η(ω, g)v(σ (ω)) (D + 1)D2 ≤ ediamθ(L(N,ω)C) L˜(N, ω) kgk ediamθ(L(n,ω)C) − 1 . 2 Since X = C −C, we apply Andˆo’sLemma to get K ≥ 1 such that for any x ∈ X, there exist g1, g2 ∈ C such that x = g1 − g2 and kgik ≤ K kxk. Suppose that x ∈ ker(η(ω, ·)). Then, by the triangle inequality we obtain:

˜ ˜ n L(n, ω)x = L(n, ω)x − η(ω, x)v(σ (ω)) 2K(D + 1)D2 ≤ ediamθ(L(N,ω)C) L˜(N, ω) kxk ediamθ(L(n,ω)C) − 1 . 2 Only the diameter term depends on n; let F = F (ω) be the other term. By Proposition

49 DP  2.5.2 and Lemma 2.5.3 applied in order, we have (because tanh 4 ∈ (0, 1)):

 j+(ω,n)−1 ! tanh DP ·D ˜ 4 P L(n, ω)x ≤ F (ω) e − 1

+ !  l (ω,n)/kP −2 tanh DP ·D ≤ F (ω) e 4 P − 1 .

Note that log(ex − 1) is asymptotically equivalent to log(x) as x tends to 0 and that −1 + ˜ n l (ω, n) tends to µ(GP ) for all ω ∈ G, by Birkhoff’s theorem. Taking logarithms, dividing by n, and taking a lim sup, we thus have:

1 lim sup log L˜(n, ω)x n→∞ n + !  l (ω,n)/kP ! 1 F (ω)DP 1 DP ≤ lim sup log 2 + log tanh n→∞ n DP  n 4 tanh 4 1 l+(ω, n)  D  = lim sup · log tanh P n→∞ kP n 4 ! µ(G ) D −1 = − P log tanh P , kP 4 where we use an explicit negative sign to indicate the sign of the quantity. The in-

˜ equality in the theorem statement follows by rewriting L(n, ω) . ∗ 1 Next, assume that λ = limn→∞ n log kL(n, ω)k is finite. Then by Proposition ∗ 14 in [17], we see that the Lyapunov exponent λ1(ω) for v(ω) is equal to λ for all ω ∈ Ω.˜ Moreover, Lemma A.1.5 tells us that because the Birkhoff sums converge µ- 1 ∗ R almost everywhere, log(φ) ∈ L (µ), and so λ = Ω log(φ) dµ. As well, the bound on the Lyapunov exponent for L˜(n, ω) on ker(η(ω, ·)) and the equivariance of the ∗ decomposition show that the top exponent λ1 = λ has multiplicity one (corresponding to spanR{v(ω)}). Finally, we want to show that the projections are norm-tempered, that is, kπωk and kI − πωk are tempered functions. To do this, we observe that

˜ diamθ(L(N,ω)C) ˜ 1 ≤ kπωk = kη(ω, ·)k ≤ 2KD e L(N, ω) ,

n where we may choose N = N(ω) to be the first time that σ (ω) enters GP (for n ≥ 1). Then the exponential term is at most eDP for every ω, and it remains to deal with the

50

˜ L(N(ω), ω) term. We have:

1  kL(N(σn(ω)), σn(ω))k  0 ≤ log n kL(N(σn(ω)), σn(ω))v(σn(ω))k N(σn(ω))−1 1 X n+i n+i  ≤ log L(1, σ (ω)) − log(φ(σ (ω))) . n i=0

By Lemma A.1.4, this last sum converges to 0 for µ-almost every ω, because the Birkhoff sums for the two individual sums converge to the same finite quantity and N satisfies

−1 ˜ n n the hypothesis of the lemma, by Lemma A.1.3. Thus n log L(N(σ (ω)), σ (ω)) converges to 0, so that the norm of πω is tempered. Clearly, kI − πωk ≤ 1 + kπωk, so the norm of I − πω is also tempered. The proof of Theorem 2.4.3 is complete.

Proof of Corollary 2.4.5. We see that nP is a measurable function, from (Ω, B) to Z≥0∪ {∞}, by observing that first that by Lemma 2.5.1, ω 7→ diam(L(k, ω)C) is measurable, −1 and if Gk = diamθ(L(k, ·)C) [0, ∞), then

( K ) X nP (ω) = inf {k ≥ 1 : diamθ(L(k, ω)C) < ∞} = sup 1 − χGk . K≥1 k=1

Now, the key observation for the Corollary is that nP satisfies nP (ω) ≤ nP (σ(ω)) + 1 for all ω ∈ Ω. To see this fact, first note that if nP (σ(ω)) = ∞, then the inequality is trivial. Next, if nP (σ(ω)) is finite, then we have, using Proposition 2.1.13,

diamθ(L(nP (σ(ω)) + 1, ω)C) = diamθ(L(nP (σ(ω)), σ(ω))L(1, ω)C)

≤ diamθ(L(nP (σ(ω)), σ(ω))C) < ∞.

−1 Thus nP (ω) ≤ nP (σ(ω)) + 1. This implies that nP (Z≥1) is σ-invariant, and thus measure 0 or 1 because σ is ergodic. By hypothesis, nP is finite on a set of positive measure, and so nP is finite µ-almost everywhere.  −1 Let kP = min k ≥ 1 : µ(nP {k}) > 0 , which exists because nP is finite µ-almost everywhere. Observe that

∞ −1 [ nP {kP } = {ω ∈ Ω : diamθ(L(kP , ω)C) ≤ M} , M=1 and the union is one of increasing sets, so that one of them has positive measure, say at M0. Set GP = {ω ∈ Ω : diamθ(L(kP , ω)C) ≤ M0} and DP = M0; this selection of kP ,GP ,DP satisfy the hypotheses of Theorem 2.4.3.

51 2.6 Easy Applications of Theorem 2.4.3

We apply Theorem 2.4.3 to four situations in finite dimensions; by Example 2.1.15, if a real matrix has all positive entries, then it contracts the non-negative cone. We take advantage of this fact to illustrate the application of Theorem 2.4.3 and to show the strength, or lack thereof, of the computational estimate of the spectral gap. Note that a map into Md(R) is strongly measurable if and only if it is measurable in each entry of the matrix, so that all of the cocycles we are about to write down are strongly measurable, hence yielding strongly measurable random dynamical systems.

Example 2.6.1. Let P be the matrix from Figure 1.1 in the Introduction,

0.5 0.2 0.1 0.1 0.3 0.6 0.1 0.1   P =   . 0.1 0.1 0.4 0.4 0.1 0.1 0.4 0.4

Standard linear algebra techniques show that the spectrum of P is

 3 3  σ(P ) = 1, , , 0 , 5 10 and that corresponding normalized right eigenvectors (with respect to the 1-norm) are

3/14 −1/6 −1/2  0   2/7  −1/3  1/2   0          v =   , v1 =   , v2 =   , v3 =   .  1/4   1/4   0  −1/2 1/4 1/4 0 1/2

The vector v is the stationary distribution for the Markov chain; the other eigenvectors correspond to coherent structures that decay at the different rates corresponding to the eigenvalues. The vector v1 is −1 times an invariant density for the upper-left two-by- two block with 1 times an invariant density for the lower-right two-by-two block; this quantity represents the decay due to mass transfer between the states {1, 2} and the states {3, 4} (the leakage from Figure 1.4). The vectors v2 and v3 are also eigenvectors for the upper-left and lower-right blocks (padded with zeroes), representing structures within the chain. Moreover, the left eigenvector corresponding to 1 is w = [1 1 1 1] (which we think of as a positive linear functional). Observe that the eigenvectors v1, v2, v3 all satisfy w(vi) = 0; thus, every vector x without a component in the v direction (those satisfying 3n w(x) = 0) satisfies kP nxk = O . Here, then, is our spectral gap. Note that 1 5 these are eigenvalues; we can take logarithms to see that the Lyapunov spectrum of

52 (the cocycle) P is {0, log(3/5), log(3/10), −∞}. (n) Consider the cocycle Pω defined over the one-point space Ω = {∗} equipped (n) n with a point mass and the identity map, given by P∗ = P (simply including the deterministic case in the random setting). We use the non-negative cone C, as in Examples 2.1.5 and 2.1.15. The entries of P are between 0.1 and 0.6, so on a set

GP = {∗} of measure 1, P contracts C in kP = 1 step, and we have

diamθ(P C) ≤ 2(log(0.6) − log(0.1)) = 2 log(6) =: DP .

Hence, we see that the second-largest Lyapunov exponent λ2 satisfies ! 1 2 log(6)−1 λ ≤ λ − log tanh 2 1 1 4 elog(6) + 1 7 5 = 0 − log = − log = log . elog(6) − 1 5 7

We have log(5/7) ≈ −0.3365 and log(3/5) ≈ −0.5108, so we see that the upper bound for λ2 is not sharp. For this deficiency, we can blame our chosen upper bound, DP : there is more contraction of the cone than is captured by our loose estimate on the diameter using the largest and smallest entries of P .

Example 2.6.2. Suppose that again in the deterministic case, P = [v ··· v] for v ∈ C with 1-norm equal to 1. We see that the diameter of P C is zero, because the image is simply the line through v. Then we have ! 1 0−1 λ ≤ 0 − log tanh = −∞. 2 1 4

Here, the bound is sharp; since P x = (1 · x)v for all x, the kernel of P is the 1- codimensional space ker(1T ), so all vectors whose components sum to zero are annihi- lated in one step.

Example 2.6.3. Once more in the deterministic case, suppose that P is the 3-by-3 matrix given by   0.3 0.5 0   P =  0 0.5 1 . 0.7 0 0 Using software, we can compute powers of P , to see that:     0.09 0.4 0.5 0.377 0.245 0.4 2   3   P =  0.7 0.25 0.5 ,P =  0.56 0.48 0.25 . 0.21 0.35 0 0.063 0.275 0.35

53 So, we take GP = {∗} and kP = 3, with DP = 2(log(0.56) − log(0.063)) = 2 log(80/9). Then Theorem 2.4.3 says that ! 1 2 log(80/9)−1 λ ≤ λ − log tanh 2 1 3 4 1 elog(80/9) + 1 = 0 − log 3 elog(80/9) − 1 1 89/9 1 89 = − log = − log ≈ −0.0753. 3 71/9 3 71

However, the classical Perron-Frobenius theorem says that P has a unique positive eigenvector corresponding to the eigenvalue 1 (since P is column-stochastic) and there is a spectral gap arising from the eigenvalues of P . Computing the spectrum of P in √ 1 this case yields {1, 10 (−1 ± i 34)}, and so P has a real canonical form of   1 0 0 √ 0 −1/10 34/10 .  √  0 − 34/10 −1/10

If x is in the two-dimensional invariant subspace, then v u 2 √ !2 u−1 34 kP xk ≤ t + kxk = p7/20 kxk 2 10 10 1 1

(note that p7/20 is the modulus of the complex eigenvalues of P ). Hence, we find an upper bound for the Lyapunov exponents for the vectors in the two-dimensional invariant subspace: √ 1 n 1  n  λ(∗, x) = lim sup log kP xk1 ≤ lim sup log 3 kP xk2 n→∞ n n→∞ n r 7 1  7  ≤ log = log ≈ −0.5249. 20 2 20

On top of the already limited power of the contraction, we see that because P does −1 not contract the cone at every step, we lose a factor of kP to the contraction over kP steps. Example 2.6.4. Let us examine a cocycle of matrices for which we can explicitly compute the equivariant density and the Lyapunov exponents. Let Ω = {0, 1}Z be the bi-infinite sequence space on two symbols, with the usual completion of the Borel

σ-algebra generated by cylinder sets (which we will denote by C(a0 ··· an−1), meaning those sequences whose zero-th through (n−1)-st symbols are a0 through an−1). Endow

54 Ω with the p-(1 − p) product measure µ that gives measure p to C(0) and measure 1−p to C(1) (extended to the whole σ-algebra using Kolmogorov consistency), and let 2 σ be the left shift on Ω, where (σ(ω))i = ωi+1. Let  ∈ (0, 1), and let A :Ω → Mn(R ) be given by  " #  1 −  0 A0 = ω0 = 0,   1 A(ω) = " #  1  A1 = ω0 = 1.  0 1 −  The function A then generates a cocycle over σ. The idea here is that with probability p, A(1, ω) leaks  mass from state 0 to state 1, and the rest of the time A(1, ω) leaks  mass in the other direction. We can explicitly compute equivariant families of vectors for A; in general, computing explicit equivariant vectors is not feasible, but in this case we can take advantage of the relative simplicity of the cocycle and the underlying base space. First, we apply Theorem 2.4.3 to A, which is a strongly measurable cocycle into d 2 B(R ) = Md(R). We continue to use the non-negative cone C ⊆ R ; here, observe that " #" # " # 1 −  0 1  1 −   − 2 A A = = , 0 1  1 0 1 −   1 −  + 2 " #" # " # 1  1 −  0 1 −  + 2  A A = = . 1 0 0 1 −   1  − 2 1 − 

In our cylinder set notation, we see that A(2, ω) = A0A1 when ω ∈ C(01) and A(2, ω) =

A1A0 when ω ∈ C(10), so that on GP = C(01)∪C(10) (which has measure 1/4+1/4 =

1/2), A(2, ω) contracts C. Set kP = 2 and

1 −  + 2  D = 2 log(1 −  + 2) − log( − 2) = 2 log , P  − 2 where as usual we use the bound from Example 2.1.15 (even though it is not sharp). Then we know that A has an equivariant density corresponding to the largest Lyapunov exponent for A; the largest Lyapunov exponent is zero, because the 1-norm is preserved by A(1, ω) for all non-negative vectors and the operator norm of each A(1, ω) is equal to 1 (for the same reason). Moreover, there exists a one-dimensional space on which A

55 grows no faster than λ2, which is bounded above by (using Theorem 2.4.3): ! 1/2 1  1 −  + 2 −1 λ ≤ 0 − log tanh log 2 2 2  − 2

1−+2 ! 1 2 + 1 = − log − 4 1−+2 −2 − 1 1  1  1 = − log = log(1 − 2(1 − )), 4 1 − 2 + 22 4 after some computation. At the beginning of this example, we mentioned that we can explicitly find equivari- ant vectors for A; we do that now. Given the description of A in the previous paragraph, T one might expect that v2(ω) = [1 −1] is equivariant, because in both cases  mass is moved from one side to the other, and the signs should work out appropriately. We are lucky enough that our expectation is correct:

" #" #  1 −  0 1  ω0 = 0,   1 −1 A(ω)v2(ω) = " #" #  1  1  ω0 = 1,  0 1 −  −1 " #  1 −   ω0 = 0,   − 1 = " #  1 −   ω0 = 1,  −(1 − )

= (1 − )v2(σ(ω)).

For every ω, we see that log(1 − ) = λ(ω, v2(ω)) is a Lyapunov exponent for A (over ω). So, we have found the codimension-one decaying space; what is the equivariant density corresponding to Lyapunov exponent zero? The proof of Theorem 2.4.3(1) could illustrate it, but it is quite hard to estimate A(n, σ−n(ω)). Instead, observe that P i+1 1 −1 i≤−1(1 − ) = 1−(1−) =  is finite, and note that we know the entire “past” of the orbit of ω (since the shift is invertible). So, with a helping of divine inspiration, try the following vector:

 P (1 − )i+1 v (ω) =  {i≤−1 : ωi=1}  . 1  P (1 − )i+1 {i≤−1 : ωi=0}

56 By the summation we just wrote down, v1(ω) is positive with unit norm, and it is a mea- surable function of ω because each component is a pointwise limit of the partial sums

(all of which are measurable). We then wish to show that A(1, ω)v1(ω) = v1(σ(ω)); the easiest way to show this fact is to start with v1(σ(ω)) and rewrite it in the two cases ω0 = 0 and ω0 = 1. Recalling that σ(ω)i = ωi+1, we re-index the summations in v1(σ(ω)) by setting j = i + 1 to obtain:

 P (1 − )−(i+1)  P (1 − )−j v (σ(ω)) =  {i≤−1 : σ(ω)i=1}  =  {j≤0 : ωj =1}  . 1  P (1 − )−(i+1)  P (1 − )−j {i≤−1 : σ(ω)i=0} {j≤0 : ωj =0}

First, suppose that ω0 = 0. In the first component, we see that X X X (1 − )−j = (1 − )−j = (1 − ) (1 − )−(j+1).

{j≤0 : ωj =1} {j≤−1 : ωj =1} {j≤−1 : ωj =1}

In the second component, since ω0 = 0 the j = 0 term gives us an extra term equal to P −(j+1) 1 in the summation. However, we know that 1 = / =  j≤−1(1 − ) , and so pulling out a factor of 1 −  in the sum as in the first component we obtain: X X X (1 − )−j = (1 − ) (1 − )−(j+1) +  (1 − )−(j+1)

{j≤0 : ωj =0} {j≤−1 : ωj =0} j≤−1 X X = (1 − )−(j+1) +  (1 − )−(j+1).

{j≤−1 : ωj =0} {j≤−1 : ωj =1}

Putting these two equations together yields the desired relation:

 (1 − ) P (1 − )−(j+1)  v (σ(ω)) =  {j≤−1 : ωj =1}  1  P (1 − )−(j+1) +  P (1 − )−(j+1) {j≤−1 : ωj =0} {j≤−1 : ωj =1}  P j+1 " # (1 − ) 1 −  0 = {j≤−1 : ωj =1}  = A(1, ω)v (ω).  1  P (1 − )j+1 1 {j≤−1 : ωj =0}

The case where ω0 = 1 is similar; the extra 1 term goes into the first component of

57 v1(σ(ω)) instead of the second, and so we simply have:

 P (1 − )−(j+1) +  P (1 − )−(j+1) v (σ(ω)) = {j≤−1 : ωj =1} {j≤−1 : ωj =0}  1  (1 − ) P (1 − )−(j+1)  {j≤−1 : ωj =0}  P j+1 " # (1 − ) 1 1 −  = {j≤−1 : ωj =1}  = A(1, ω)v (ω). 0 1  P (1 − )j+1 1 {j≤−1 : ωj =0}

In summary, we have directly found equivariant vectors v1(ω) and v2(ω) that yield directly the Lyapunov exponents for A. We can compare our estimate on λ2 with the exact value, as in previous examples. Observe that for  ∈ (0, 1),

1 log(1 − ) < log(1 − 2(1 − )) < 0, 4 by multiplying by 4, taking exponentials, subtracting the left side from the right, and rearranging:

1 − 2 + 22 − (1 − )4 = 2 − 42 + 43 − 4  1  = 2 1 − 2(1 − ) − 4 > 0. 2

Given that we happen to be in a MET situation now (the cocycle is quasi-compact), we could have also used Lemma A.1.1 to see that the second-largest of the two Lyapunov exponents is exactly log(1 − ): the determinant of each A(1, ω) is (1 − ), and so Z λ2 = λ1 + λ2 = log |det(A(1, ω))| dµ = p log(1 − ) + (1 − p) log(1 − ) = log(1 − ). Ω The last thing to see here is that as  tends to 0, the second-largest Lyapunov exponent also tends to 0, and moreover the asymptotic rate is linear: log(1 − ) ∼ −. The upper-bound on the second-largest Lyapunov exponent is not as good, but it is still linear: 1 1  log(1 − 2(1 − )) ∼ − ((1 − )) ∼ − . 4 2 2 This type of asymptotic comparison will come back in Section 4.6 and Chapter 5.

We will apply Theorem 2.4.3 in one more situation before we move on to our high-powered application, once we have developed some theory of bounded variation functions and proven our new Lasota-Yorke inequality; see Example 4.4.2.

58 Chapter 3 Balanced Lasota-Yorke-type Inequality

We wish to apply Theorem 2.4.3 to a cocycle of Perron-Frobenius operators arising from a cocycle of maps like the one in Figure 1.2. Traditionally (as mentioned in the Introduction), we consider the operators as acting on spaces of bounded variation functions, and a primary tool is a bound on the variation of the image of a function under the operator in terms of a positive linear combination of the variation of the original function and its 1-norm:

Var(Lf) ≤ CVar Var(f) + C1 kfk1 .

The first of these inequalities was proven by Lasota and Yorke in 1973, and there have been other such inequalities proven since then. Here, we list the original Lasota-Yorke inequality and a sample of other inequalities that will be relevant for our purposes.

1. Lasota-Yorke, 1973 [29]: T :[a, b] → [a, b] piecewise C2, expanding, finitely many N 0 branches {In}n=1, λ = inf{|T |}. Then:

 00  ( 0 −1 ) 2 |T | supx∈In |T | CVar = ,C1 = sup 2 + 2 sup . λ |T 0| 1≤n≤N Leb(In)

2. Rychlik, 1983 [48]: T :[a, b] → [a, b] piecewise homeomorphism, expanding,

countably many branches {In}n, g = 1/J (reciprocal of the Jacobian) on the

interiors of each In and 0 on the endpoints, g bounded variation, α a finite partition of [a, b] into closed intervals. Then:   VarJ (g) CVar = kgk + max {VarJ (g)} ,C1 = max . ∞ J∈α J∈α Leb(J)

3. Eslami-G´ora, 2013 [12]: T :[a, b] → [a, b] piecewise C1,1, expanding, finitely

59 N 0 0 many branches {In}n=1 with In = (an, bn), λn = infIn {|T |}, Mn = LipIn (T ). n 1H (a1,+) 1H (a2,+) o n 1H (bN ,−) 1H (bN−1,−) o Set η1 = max , , ηN = max , , and ηn = λ1 λ2 λN λN−1 n 1 1 o max H (bn−1,−) , H (an+1,+) for 1 < n < N (where H is the set of “hanging” λn−1 λn+1 points, meaning the branch of T at that point from that direction does not end at a or b). Then:     1 Mn 2ηn CVar = max + ηn ,C1 = max 2 + . n λn n λn Leb(In)

In our situation, we will be trying to use a Lasota-Yorke-type inequality to show that a particular cone is preserved by an entire family of Perron-Frobenius operators. Unfortunately, the cocycles we will consider have the issue that their intervals of mono- tonicity can be arbitrarily small, which means a uniform bound using either the original Lasota-Yorke inequality or the Eslami-G´orainequality is impossible. Similarly, the re- ciprocal of the derivative can have variation larger than what we require on choices of finitely many intervals in Rychlik’s inequality unless we choose the intervals very small, which causes the same problem. (See Remark 4.0.2 for more information on why these inequalities fail.) Therefore, we need to prove a new inequality that allows us to use it flexibly over a family of operators. The inequality will trade sharpness in the variation constant CVar for a better and more effective bound C1 that will allow us to obtain uniform estimates; this trade-off is why we call it a “balanced” Lasota-Yorke inequality. The goal of this chapter is to do just that, by providing the necessary background and then stating and proving the inequality, Proposition 3.2.10.

3.1 Bounded Variation and Setting

We formally define the space of bounded variation as a vector subspace of L∞; in particular, we use equivalence classes of functions instead of actual functions. Many of the standard results about bounded variation functions carry over immediately, but some require a bit of care. We accept this trouble in order to reap the benefits of working directly in the correct Banach space from the beginning. Note that in our definition, we use real-valued functions and equivalence classes thereof. Moreover, while we could define the space of bounded variation only using a totally ordered set, for our purposes we require an order complete set, and so we will restrict to that situation.

Basic Definitions and Properties Definition 3.1.1. Let (X, ≤) be a totally ordered, order complete set, equipped with its order topology. For a function f : X → R and a set C ⊆ X, we define the variation

60 of f on C, denoted VarC (f), by

( n ) X n VarC (f) := sup |f(xi) − f(xi−1)| : {xi}i=0 ⊆ C, xi ≥ xi−1 . i=1

We write Var(f) = VarX (f), and call it the variation of f. Equip (X, ≤) with a complete Borel probability measure λ, and temporarily let F be the collection of λ-almost-everywhere equivalence classes of measurable functions on X. For [f] ∈ F, define VargC [f] by

VargC [f] = inf {VarC (g)}, g=f a.e. with Var[g f] = VargX [f]. We define the space BV (λ) (with X implicit) to be n o BV (X) := [f] ∈ F : Var(g f) < ∞ .

Finally, define k[f]kBV by setting k[f]kBV := k[f]k1 + Var[g f].

For the remainder of this section, let (X, ≤, λ) be as in Definition 3.1.1. Let k[f]k∞,C denote ess sup(|f|) := inf {r ≥ 0 : λ {x ∈ C : |f(x)| > r} = 0} , C and similarly recall that

ess inf(|f|) := sup {r ≥ 0 : λ {x ∈ C : |f(x)| < r} = 0} . C

∞ The usual norm on L (λ) is k[f]k∞ = k[f]k∞,X .

Lemma 3.1.2. Let f : X → R have finite variation, and let C ⊆ X. Then f is bounded, with

sup {|f(x)| : x ∈ C} ≤ inf {|f(x)| : x ∈ C} + VarC (f).

If [f] ∈ BV (λ), then [f] ∈ L∞(λ) and

k[f]k ≤ ess inf(|f|) + VargC [f]. ∞,C C

Thus BV (λ) is a subset of L∞(λ).

Proof. For any a, b ∈ C, we have

|f(a)| − |f(b)| ≤ ||f(a)| − |f(b)|| ≤ |f(a) − f(b)| ≤ VarC (f),

61 by definition, so we obtain |f(a)| ≤ |f(b)| + VarC (f). Take a supremum over a in C and an infimum over b in C to obtain

sup {|f(x)| : x ∈ C} ≤ inf {|f(x)| : x ∈ C} + VarC (f).

For [f] ∈ BV (λ), observe that for any g ∈ [f] such that Var(g) is finite, we have the above inequality. Then we obtain

ess sup(|g|) − ess inf(|g|) ≤ sup {|g(x)| : x ∈ C} − inf {|g(x)| : x ∈ C} ≤ VarC (g). C C

Noting that the class [|f|] is equal to |[f]| and that the bounded quantity is independent of choice of g, we take an infimum over g ∈ [f] to get

ess sup(|f|) − ess inf(|f|) ≤ VargC [f]. C C

Applying this in the case of C = X yields k[f]k∞ ≤ ess infC (|f|) + Var[g f] < ∞, so that [f] ∈ L∞(λ), and so BV (λ) is a subset of L∞(λ).

Lemma 3.1.3. For f, g : X → R each with bounded variation, C ⊆ X, and d ∈ R, we have:

VarC (df) = |d| VarC (f),

VarC (f + g) ≤ VarC (f) + VarC (g),

VarC (fg) ≤ sup {|f(x)| : x ∈ C} VarC (g) + sup {|g(x)| : x ∈ C} VarC (f).

For [f], [g] ∈ BV (λ), then we have:

 VargC d[f] = |d| VargC [f],  VargC [f] + [g] ≤ VargC [f] + VargC [g],  VargC [f][g] ≤ k[f]k∞,C VargC [g] + k[g]k∞,C VargC [f].

Thus BV (λ) is a vector subspace of L∞(λ) (with the usual addition and scalar multi- plication).

Proof. Let x0 < x1 < . . . xn be elements of C. Then we have

n n X X |df(xi) − df(xi−1)| = |d| |f(xi) − f(xi−1)| , i=1 i=1 and taking a supremum over all finite sets of points in C yields VarC (df) = |d| VarC (f).

62 Similarly, for f and g we have

n n n X X X |(f + g)(xi) − (f + g)(xi−1)| ≤ |f(xi) − f(xi−1)| + |g(xi) − g(xi−1)| i=1 i=1 i=1

≤ VarC (f) + VarC (g).

Taking a supremum over all x0 < . . . xn yields VarC (f + g) ≤ VarC (f) + VarC (g). Finally, we have

n X |(fg)(xi) − (fg)(xi−1)| i=1 n n X X ≤ |f(xi)| |g(xi) − g(xi−1)| + |g(xi−1)| |f(xi) − f(xi−1)| i=1 i=1 n X ≤ sup {|f(x)| : x ∈ C} |g(xi) − g(xi−1)| i=1 n X + sup {|g(x)| : x ∈ C} |f(xi) − f(xi−1)| i=1

≤ sup {|f(x)| : x ∈ C} VarC (g) + sup {|g(x)| : x ∈ C} VarC (f).

Taking a supremum over all x0 < ··· < xn yields

VarC (fg) ≤ sup {|f(x)| : x ∈ C} VarC (g) + sup {|g(x)| : x ∈ C} VarC (f).

For equivalence classes [f], [g] ∈ BV (λ), first observe that

 VargC d[f] = inf {VarC (dg)} = inf {|d| VarC (g)} = |d| VargC [f]. g=f a.e. g=f a.e.

Next, pick f1 ∈ [f] and g1 ∈ [g] with finite variation, and observe that  VargC [f] + [g] = VargC [f + g] ≤ VarC (f1 + g1) ≤ VarC (f1) + VarC (g1).

This inequality holds for all versions f1 of f and g1 of g with finite variation, so take  an infimum over both f1 and g1 to obtain VargC [f] + [g] ≤ VargC [f] + VargC [g]. These two facts, in the case where C = X, show that BV (λ) is a vector subspace of L∞(λ).

Finally, given f1 ∈ [f] with finite variation, define f1,C by setting ( f1(x), |f1(x)| ≤ ess supC (|f1|), f1,C (x) = sgn(f1(x)) ess supC (|f|), |f1(x)| > ess supC (|f1|).

63 Then sup {|f1,C (x)| : x ∈ C} = k[f]k∞,C , and because we have

|f1,C (x) − f1,C (y)| ≤ |f1(x) − f1(y)| for all x, y ∈ C (since we are truncating f1), we have VarC (f1,C ) ≤ VarC (f1). Moreover, we have f1,C ∈ [f], by definition (and completeness of λ). We apply these facts twice to obtain:

 VargC [f][g] = VargC [fg] ≤ VarC (f1,C g1,C )

≤ sup {|f1,C (x)| : x ∈ C} VarC (g1,C )

+ sup {|g1,C (x)| : x ∈ C} VarC (f1,C )

≤ k[f]k∞,C VarC (g1) + k[g]k∞,C VarC (f1).

Take an infimum over f1 ∈ [f] and g1 ∈ [g] with finite variation to obtain  VargC [f][g] ≤ k[f]k∞,C VargC [g] + k[g]k∞,C VargC [f].

There are a number of norms one could place on BV (λ); we choose one that we will use throughout.

Lemma 3.1.4. The quantity Varg = VargX : BV (λ) → R is a seminorm that is zero exactly on the constant equivalence classes, and k·kBV := k·k1 + Varg is a norm on BV (λ). Proof. By Lemma 3.1.3, Varg is a seminorm on BV (λ). Suppose that Var[g f] = 0. −1 Choose, for each n ∈ Z≥1, fn ∈ [f] such that Var(fn) < n . Choose cn in the interval [infx{fn(x)}, supx{fn(x)}]; we will show that the sequence of cn is Cauchy, and that the constant function taking value equal to the limit of the cn is an element of [f]. For all x ∈ X, we have

1 |fn(x) − cn| ≤ sup{fn(x)} − inf{fn(x)} ≤ Var(fn) < . x x n

Since each fn is an element of the class [f], find Ω ⊆ X with full measure and f0 ∈ [f] such that for all x ∈ Ω and n ≥ 1, fn(x) = f0(x). Let  > 0, and observe that for m, n ≥ 1 such that m−1 + n−1 < , we have

1 1 |c − c | ≤ |c − f (x)| + |f (x) − f (x)| + |f (x) − c | < + 0 + < . m n m m m n n n m n

−1 Hence (cn) is a Cauchy sequence, convergent to c∞. Choosing N ≥ 1 such that N <

/2 and |cn − c∞| < /2 for all n ≥ N, we see that for x ∈ Ω,

1  |f (x) − c | = |f (x) − c | ≤ |f (x) − c | + |c − c | < + < , 0 ∞ n ∞ n n n ∞ n 2

64 and so we have f0(x) = c∞ for all x ∈ Ω. Thus f0 is constant λ-a.e., and so c∞1 ∈ [f] (that is, [f] “is” constant). Conversely, it is clear that if [f] contains a constant function, then Var[g f] = 0, since   1 1 Var( ) = 0. We have shown that ker Varg = spanR{[ ]}.

Finally, note that k·k1 is a norm on BV (λ). The sum of two seminorms is a seminorm, and if one of the seminorms is non-degenerate, then the sum is also non- degenerate, hence k·kBV := k·k1 + Varg is a norm on BV (λ). An important fact about bounded variation functions is that they are differences of monotone functions. This fact implies that one-side limits of these functions exist at every point in their domain, and two-sided limits exist at all but countably many points. Moreover, we can lift these statements about functions to statements about equivalence classes. The first lemma can be found in [45].

Lemma 3.1.5. Let f : X → R have finite variation. Then f = f1 − f2, where f1, f2 : X → R are (weakly) increasing functions on X with finite variation.

Proof. Let f1(x) = Var[aX ,x](f). Then f1 is weakly increasing. Set f2 = f1 − f, and let y < x in X. Then we have:

f2(x)−f2(y) = f1(x)−f1(y)−(f(x)−f(y)) = Var[aX ,x](f)−Var[aX ,y](f)−(f(x)−f(y)).

Let  > 0, and find x0 < x1 < ··· < xn ≤ y such that

n X |f(xi) − f(xi−1)| ≥ Var[aX ,y](f) − . i=1

Using this inequality in the first equation yields

n X Var[aX ,x](f) − Var[aX ,y](f) ≥ |f(x) − f(y)| + |f(xi) − f(xi−1)| − Var[aX ,y](f) i=1 ≥ |f(x) − f(y)| − .

Taking  to 0 shows that f2(x) − f2(y) ≥ 0, so that f2 is weakly increasing also, hence f = f1 − f2 is the difference of two weakly increasing functions. Note that

Var(f1) = Var(f) < ∞ because f1 is weakly increasing, and f2 is the difference of two finite variation functions, hence also finite variation (by Lemma 3.1.3).

One-Sided Limits and Further Inequalities

If f :[a, b] → R is monotone with finite variation, then it has one-sided limits at every point of its domain, from all applicable sides. In arbitrary totally ordered, order

65 complete sets, it may not always be possible to approach a point from a particular side. Take, for example, X = [0, 1] ∪ {2} ∪ [3, 4], in its order topology (which is also the subspace topology from R). The point 1 is only a limit point from points less than 1, the point 3 is only a limit point from points larger than 3, and 2 is not a limit point of X, so if f : X → R is a function with finite variation, then it makes no sense to talk about any of the quantities

lim f(t), lim f(t), lim f(t). t→1+ t→2± t→3−

We make a definition to clarify for which points it does make sense to consider one-sided limits.

Definition 3.1.6. Let (X, ≤) be a totally ordered, order complete set equipped with its order topology. For x ∈ X, denote Ux = {y ∈ X : y > x} and Lx = {y ∈ X : y < x}. For x ∈ X and s ∈ {+, −},(x, s) is called a one-tailed point if s = + and x is in the closure of Ux, or if s = − and x is in the closure of Lx. A one-tailed point (x, s) is in a closed interval I ⊆ X when x is in the closure of I ∩ Ux if s = + or the closure of I ∩ Lx if s = −. If (x, s) ∈ X and f : X → R, then we define the one-sided limit of f from the s direction to be f(xs) := lim f(t), t→xs if the limit exists.

One may also say that (x, +) ∈ I when x ∈ I and x is adherent to I ∩ Ux (to use different terminology), and similarly for (x, −). The broad interpretation is that (x, s) is in some closed interval I when it is possible to approach x from the s direction from within I; this approaching phenomenon is like the point x having a tail in I, which is necessary for one-sided limits to make sense.

Example 3.1.7. Let X = [0, 1] ∪ {2} ∪ [3, 4], equipped with its order topology. Let I = [0, 0.5], J = [0.5, 1]∪{2}∪{3} (also known as [0.5, 3]), and K = [3, 4]. The picture looks like this:

X I J K

a. (0, +) ∈ I, because 0 ∈ I and I ∩ U0 = (0, 0.5], which closes to [0, 0.5], hence contains 0. On the other hand, (0, −) ∈/ I, because while 0 ∈ I, we have that

L0 = {y ∈ X : y < 0} = ∅,

and so the closure of I ∩ L0 is empty, and thus does not contain 0.

66 b. (0.25, ±) are both in I, since I ∩ U0.25 = (0.25, 0.5] and I ∩ L0.25 = [0, 0.25), and x is adherent to both of those sets (and is contained in I).

c. (0.5, −) is in I, because 0.5 ∈ I and I ∩ L0.5 = [0, 0.5), which closes to [0, 0.5],

thus containing 0.5. (0.5, +) is not in I, as I ∩ U−0.5 = ∅.

d. (1, −) is in J. (1, +) is not in J, because J ∩ U1 = {2} ∪ {3}, which does not have 1 as a limit point (it is closed and does not contain 1). The interpretation is that it is impossible to approach 1 from the right, because the nearest point to 1 is 2.

e. Neither (2, +) nor (2, −) are in J, since J ∩ U2 = {3} and J ∩ L2 = [0.5, 1], and 2 is not adherent to either of those. This means that isolated points x cannot be one-tailed in either direction. For the same reasons, (3, −) ∈/ J.

f. (3, +) ∈ K but (3, −) ∈/ K, for similar reasons to the above; similarly, (4, −) ∈ K but (4, +) is not.

Lemma 3.1.8. Let f : X → R have finite variation. Then for all one-tailed points (x, s) ∈ X, the one-sided limits f(xs) = lim f(t) exist. If [f] ∈ BV (λ) and (x, s) is a t→x± one-tailed point in X such that every open interval in the s direction from x ((a, x) if s = − and (x, b) if s = +) has positive measure, then g(xs) takes the same value for all g ∈ [f] with finite variation; hence [f](xs) is a well-defined quantity.

Proof. By Lemma 3.1.5, write f = f1 − f2, where fi : X → R are weakly increasing − and bounded. Suppose that s = −; then as t → x , for each i = 1, 2 we have fi(t) converging to sup {fi(t): t < x}, by definition of the supremum and the fact that each fi is weakly increasing and bounded. Similarly, if s = + then fi(t) converges to s s s s inf {fi(t): t > x}. Therefore each fi(x ) exists, and so f(x ) = f1(x ) − f2(x ) exists. Let [f] ∈ BV (λ), and let (x, s) be a one-tailed point in X. Suppose that s = − and that every (a, x) has positive measure. Let g1, g2 ∈ [f] with finite variation, and s s 1 s s suppose that g1(x ) > g2(x ). Let  = (g1(x ) − g2(x )), and observe that because Z 3 g1, g2 ∈ [f], we have |g1 − g2| dλ = 0. Since (a, x) has positive measure, there (a,x) exists ta ∈ (a, x) such that g1(ta) = g2(ta). By the definition of convergence, let a < x − − be such that for all t ∈ (a, x), g1(t) > g1(x ) −  and g2(t) < g2(x ) + . But then

− − g1(ta) > g1(x ) −  > g2(x ) +  > g2(ta),

− − which is a contradiction of the fact that g1(ta) = g2(ta). Hence g1(x ) = g2(x ), and so [f](x−) is well-defined. An analogous proof holds for s = +.

For the positive measure condition in the above lemma, we might say that (x, s) has a positive measure tail.

67 Lemma 3.1.9. Given b < y < c ∈ X, a one-tailed point (x, s) can only be in at most n S one of [b, y] and [y, c]. Hence, if X = [xi−1, xi] where xi−1 < xi for all i, then a i=1 one-tailed point (x, s) is in exactly one interval [xi−1, xi].

Proof. If (x, s) were in both [b, y] and [y, c], then x = y. However, Uy ∩ [b, y] = ∅ =

Ly ∩ [y, c], so depending on the value of s,(x, s) is not in [b, y] or [y, c], which is a n S contradiction. If X = [xi−1, xi], then x is either in exactly one [xi−1, xi] or x = xi i=1 for some 0 < i < n. In the former case, we have (x, s) ∈ [xi−1, xi], and in the latter case, (x, s) is in [xi, xi+1] if s = + or is in [xi−1, xi] if s = −. n S Lemma 3.1.10. Suppose that X = [xi−1, xi] and f : X → R with finite variation. i=1 Then we have n X Var(f) = Var[xi−1,xi](f). i=1

Suppose further that (x0, +), (xn, −), and all (xi, ±) are one-tailed points in X. Then we have

n X + +  Var(f) = Var(xi−1,xi)(f) + f(xi−1) − f(xi−1) + f(xi) − f(xi ) . i=1

Finally, suppose that each one-tailed endpoint in the partition has a positive measure tail and that none of the singleton sets {xi} are given positive measure by λ. Then for [f] ∈ BV (λ), we have:

n n−1 X X + − Var[g f] = Varg(xi−1,xi)[f] + [f](xi ) − [f](xi ) . i=1 i=1

Proof. Let y0 < ··· < ym be elements of X, where all of the xi are included in the yj. Using the definition of variation, we have

m ( n ) X X |f(yj) − f(yj−1)| ≤ min Var[xi−1,xi](f), Var(f) , j=1 i=1 since the xi are all included in the yj terms. However, we can take a supremum over all yj between each xi−1 and xi, and over all yj as a whole, in the previous inequality to obtain

( n ) ( n ) X X max Var[xi−1,xi](f), Var(f) ≤ min Var[xi−1,xi](f), Var(f) , i=1 i=1

Pn hence i=1 Var[xi−1,xi](f) = Var(f).

68 Now, suppose that (x0, +), (xn, −), and all (xi, ±) are one-tailed points in X. We look at [xi−1, xi]; let  > 0 and find y < z ∈ (xi−1, xi) such that for all t ∈ (xi−1, y) + − and t ∈ (z, xi), f(t) − f(xi−1) < /2 and f(t) − f(xi ) < /2 (since both (xi−1, +) and (xi, −) are one-tailed points). Let xi−1 = y0 < y1 < ··· < ym−1 < ym = xi, where y1 < y and ym−1 > z. We obtain

m X |f(yj) − f(yj−1)| j=1 m−1 + + X ≤ f(xi−1) − f(xi−1) + f(xi−1) − f(y1) + |f(yj) − f(yj−1)| j=2 − − + f(xi ) − f(ym−1) + f(xi) − f(xi ) + − ≤ Var(xi−1,xi)(f) + f(xi−1) − f(xi−1) + f(xi) − f(xi ) + .

Take a supremum over the yj and allow  to tend to zero to obtain

+ + Var[xi−1,xi](f) ≤ Var(xi−1,xi)(f) + f(xi−1) − f(xi−1) + f(xi) − f(xi ) .

For the other direction, we have

m−1 + X − f(xi−1) − f(xi−1) + |f(yj) − f(yj−1)| + f(xi) − f(xi ) j=2 m−1 + X ≤ |f(xi−1) − f(y1)| + f(xi−1) − f(y1) + |f(yj) − f(yj−1)| j=2 − + f(xi ) − f(ym−1) + |f(xi) − f(ym−1)|

≤ Var[xi−1,xi](f) + , which upon taking a supremum over the yj and taking  to zero yields the desired inequality, so that

+ + Var[xi−1,xi](f) = Var(xi−1,xi)(f) + f(xi−1) − f(xi−1) + f(xi) − f(xi ) .

Sum over i to complete this part of the proof. Finally, suppose further that every one-tailed endpoint has a positive measure tail, and no {xi} is assigned positive measure by λ. For every finite variation member + + − − g ∈ [f], we have g(xi−1) = [f](xi−1) and g(xi ) = [f](xi ), by Lemma 3.1.8. Since each {xi} has zero measure, given a finite variation member g of [f] we may redefine + − g(x0) = [f](x0 ) and g(xi) = [f](xi ) for i > 0 and still have g ∈ [f] with finite variation.

69 Redefining g in this way yields

+ + + − g(xi) − g(xi ) + g(xi) − g(xi ) = [f](xi ) − [f](xi ) .

From here, from the previous part of the lemma we obtain (reorganizing the summation as necessary):

n X + +  Var(g) = Var(xi−1,xi)(g) + g(xi−1) − g(xi−1) + g(xi) − g(xi ) i=1 n n−1 X X + − = Var(xi−1,xi)(g) + [f](xi ) − [f](xi ) . i=1 i=2

Take an infimum over all finite variation g ∈ [f] with redefined values at the xi to complete the proof, since redefining g at the xi shrinks the variation and thus desirable when considering the infimum.

Corollary 3.1.11. Let f : X → R have finite variation, and let C ⊆ X be a closed interval containing the one-tailed point (x, s), such that λ(C) > 0. Then Z s 1 |f(x )| ≤ |f| dλ + VarC (f). λ(C) C

If [f] ∈ BV (λ) and (x, s) is in the closed interval C with positive measure tail, then we have Z s 1 |[f](x )| ≤ [|f|] dλ + VargC [f]. λ(C) C Proof. By Lemma 3.1.2, we have that

sup {|f(x)| : x ∈ C} ≤ inf {|f(x)| : x ∈ C} + VarC (f) ≤ f(x) + VarC (f), for all x ∈ C. Integrate over C with respect to λ, divide by λ(C), and use the fact that limits preserve inequalities to see that:

f(xs) = lim f(t) ≤ sup {|f(x)| : x ∈ C} t→xs 1 Z ≤ |f(x)| dλ + VarC (f). λ(C) C

For [f] ∈ BV (λ) and (x, s) ∈ C with positive measure tail, we do exactly the same thing. First note, by Lemma 3.1.2, that

k[f]k ≤ ess inf(|f|) + VargC [f], ∞,C C and observe that on a full measure set of x in C we have ess infC (|f|) ≤ |f(x)|. Inte-

70 1 R grating and dividing by λ(C) yields ess infC (|f|) ≤ λ(C) C [|f|] dλ, and so just as before we obtain Z s 1 [f](x ) ≤ k[f]k∞,C ≤ [|f|] dλ + VargC [f]. λ(C) C We will have use for the following notation, which describes restricting a bounded variation function onto a closed interval and redefining it to be continuous from inside the interval at the endpoints.

Definition 3.1.12. For f : X → R a bounded variation function and H = [c, d] ⊆ X a closed interval where (c, +), (d, −) ∈ H (so that c < d), we define FH (f): X → R by  f(x), x ∈ (c, d),  0, x∈ / H, FH (f)(x) = + f(c ), x = c,  f(d−), x = d.

For [f] ∈ BV (λ), we define FH [f] = [FH (f)] (where some versions will not be bounded variation); observe that Lc, {c},(c, d), {d}, and Ud are all measurable in X, and f was already measurable, so FH (f) is also measurable. For any two f1, f2 ∈ [f] with finite variation, observe that FH (f1) and FH (f2) are also almost everywhere equal (since even if {c} and {d} have non-zero measure, every finite variation version has the same one-sided limits), so that FH [f] is well-defined. Example 3.1.13. In Figure 3.1, we have an example where X = [−1, 1] is a subinterval of the real line, f is a polynomial, and H = [−0.4, 0.3]. Observe that the function is essentially truncated at the endpoints of H, and set to 0 outside of H. Since f is continuous on X, the values of FH (f) at the endpoints of H are the same as those of f. The following corollary of Lemma 3.1.10 shows how we may compute the variation of these restricted functions.

Corollary 3.1.14. Write X = [aX , bX ]. Let f : X → R have finite variation and let H = [c, d] ⊆ X be a closed interval where (c, +), (d, −) ∈ H. Then we have:

+ 1 − 1 Var(FH (f)) = Var(c,d)(f) + f(c ) · (1 − {aX }(c)) + f(d ) · (1 − {bX }(d)).

For [f] ∈ BV (λ) and H = [c, d] where (c, +) and (d, −) both have positive measure tails, then we have

 + 1 − 1 Varg FH [f] = Varg(c,d)[f] + [f](c ) · (1 − {aX }(c)) + [f](d ) · (1 − {bX }(d)).

Hence FH [f] ∈ BV (λ).

71 0.4

0.3

0.2

0.1

0

−0.1

−0.2

−0.3

−0.4

−0.5

−0.6 −1 −0.5 0 0.5 1

3 2 Figure 3.1: FH (f), for f(x) = 36x + 4x − 3x − 0.1 restricted to [−1, 1] and H = [−0.4, 0.3].

Proof. We first write, by Lemma 3.1.10,

Var(FH (f)) = Var[aX ,c](FH (f)) + VarH (FH (f)) + Var[d,bX ](FH (f)).

We deal with the middle term by observing that FH (f) is right-continuous at c and left-continuous at d. For finite collections of points c = x0 < x1 < ··· < xK = d in H, we have, by the definition of FH (f)

K X |FH (f)(xk) − FH (f)(xk−1)| k=1 K−1 ! + X − = f(c ) − f(x1) + |f(xk) − f(xk−1)| + f(d ) − f(XK−1) . k=2

+ Taking a supremum over choices of points x1 < ··· < xK−1 means that x1 → c and − xK−1 → d , so the two limit terms in the equality vanish. The middle term is obtained using only points in (c, d), so taking the supremum yields

VarH (FH (f)) = 0 + Var(c,d)(f) + 0 = Var(c,d)(f).

For the Var[aX ,c](Fh(f)) term, we have two cases. First, if aX = c, then this term is zero. If not, then let x0 < x1 < . . . xK = c be a collection of points in [aX , c] (where

72 we may assume that c is one of those points). Then we have:

K K−1 X + X + |FH (f)(xk) − FH (f)(xk−1)| = f(c ) − 0 + |0 − 0| = f(c ) , k=1 k=1 regardless of what the points actually are. Taking a supremum then simply gives + Var[aX ,c](FH (f)) = |f(c )|. Putting these two cases together gives

+ 1 Var[aX ,c](FH (f)) = f(c ) · (1 − {aX }(c)).

The analogous proof works for the other term, giving

− 1 Var[d,bX ](FH (f)) = f(d ) · (1 − {bX }(d)).

Putting these three computations together results in the equality for f. For [f] ∈ BV (λ) and (c, +), (d, −) having positive measure tails in H, use Lemma

3.1.10 and the definition of FH [f] to obtain, in the case that aX < c < d < bX :     Varg FH [f] = Varg(aX ,c) FH [f] + Varg(c,d) FH [f] + Varg(d,bX ) FH [f] + − + − + FH [f](c ) − FH [f](c ) + FH [f](d ) − FH [f](d ) . + + = Varg(c,d)[f] + FH [f](c ) + FH [f](d ) .

Here we used the fact that every member of FH [f] looks like a member of [f] on (c, d). + Moreover, if aX = c, then there is no |FH [f](c )| term, and if bX = d, then there is no + |FH [f](d )| term, by the same argument as in the first part of this corollary. Putting all of the cases together, we have shown that

 + 1 − 1 Varg FH [f] = Varg(c,d)[f] + [f](c ) · (1 − {aX }(c)) + [f](d ) · (1 − {bX }(d)).

∞ Lemma 3.1.15. Let {fn}n=1 be a sequence of functions fn : C → R with finite varia- P∞ tion. Suppose that f(x) := n=1 fn(x) converges for all x ∈ X. Then

∞ X Var(f) ≤ Var(fn). n=1

∞ ∞ P  P∞  P∞ If {[fn]}n=1 ⊆ BV (λ), k[fn]k1 is finite, and [f] := n=1 fn = n=1[fn], then n=1 ∞ X Var[g f] ≤ Var[g fn]. n=1

Proof. For the actual functions fn, we switch the order of summation of a collection of non-negative numbers (a discrete Tonelli’s theorem, if you will). Given x0 < ··· < xm

73 in X, we have:

m m ∞ m ∞ X X X X X |f(xi) − f(xi−1)| = (fn(xi) − fn(xi−1)) ≤ |fn(xi) − fn(xi−1)| i=1 i=1 n=1 i=1 n=1 ∞ m ∞ X X X = |fn(xi) − fn(xi−1)| ≤ Var(fn). n=1 i=1 n=1

∞ X Taking a supremum over xi yields VarX (f) ≤ Var(fn). n=1 P∞ In the BV (λ) setting, if we have n=1 Var[g fn] = ∞ then we are done, so suppose P∞ n=1 Var[g fn] is finite. Let  > 0, and for each n ≥ 1 find gn ∈ [fn] such that  Var(gn) ≤ Var[g fn] + . 2n P∞ Because each fn = gn λ-a.e., we have that n=1 gn ∈ [f], and by choice of gn we see that ∞ ∞ ∞ ∞ X X X 1 X Var(gn) ≤ Var[g fn] +  = Var[g fn] +  < ∞. 2n n=1 n=1 n=1 n=1 Moreover, we have that for each x ∈ X,   ∞ ∞ ∞ X X X g (x) ≤ |g (x)| ≤  kg k + Var(g ) < ∞, n n  n 1 n  n=1 n=1 n=1 | {z } =k[fn]k1 so that the sequence of gn satisfies the first part of the lemma. Hence, we obtain

∞ ! ∞ ∞ X X X Var[g f] ≤ Var gn ≤ Var(gn) ≤ Var[g fn] + . n=1 n=1 n=1

∞ X Since  was arbitrary, we have Var[g f] ≤ Var[g fn], as desired. n=1 Lemma 3.1.15 allows us to show that BV (λ) is, in fact, a Banach space, and hence a valid space over which to apply Theorem 2.4.3.

Corollary 3.1.16. The normed space BV (λ) is complete with respect to k·kBV .

∞ P∞ Proof. Let [fn]n=1 be a sequence of elements of BV (λ) such that n=1 k[fn]kBV is P∞ finite. We will show that n=1[fn] converges in BV (λ); then BV (λ) is a Banach space, as every absolutely summable series converges (see Theorem 1.3.9 in [37]).

74 First, observe that by Lemma 3.1.2, we integrate over X to obtain Z k[h]k∞ ≤ ess inf[|h|] + Var[g h] ≤ |[h]| dλ + Var[g h] X

P∞ ∞ P∞ for every [h] ∈ BV (λ). Then n=1[fn] converges in L (λ) to [f], since n=1 k[fn]k∞ ∞ P∞ is finite and L (λ) is a Banach space. The sum n=1 k[fn]k1 is finite, as k[fn]k1 ≤ k[fn]kBV , so by Lemma 3.1.15 we have

∞ X Var[g f] ≤ Var[g fn], n=1 hence [f] ∈ BV (λ). Finally, we have:

N N X X [f] − [fn] ≤ [f] − [fn] −→ 0, N→∞ n=1 1 n=1 ∞ N ! ∞ ! ∞ X X X Varg [f] − [fn] = Varg [fn] ≤ Var[g fn] −→ 0, N→∞ n=1 n=N+1 n=N+1

PN which when put together imply that n=1[fn] converges to [f] in BV (λ). Remark 3.1.17. Some proofs of Corollary 3.1.16 first prove that the unit ball in BV (λ) is compact in the L1 norm and obtain completeness of BV (λ) from there (e.g. Lemma 5 in [22]; the authors show compactness and use a general lemma about the relation between Hausdorff topologies on the same vector space). We do not use that fact, which relies on a nicer underlying space X than we have assumed so far (no isolated points, being able to redefine functions on any set of measure zero to exactly obtain the infimum in Var).g In our application, we only need completeness of BV (λ) and not the compact embedding, so we are satisfied with the proof given.

Remark 3.1.18. Unfortunately, BV (λ) is not separable in most cases (barring trivial cases like λ being a sum of finitely many point masses or similar). To see this fact, consider the set D = {[1

75 A Nice Cone in BV (λ) We will be working in BV (λ) for our main application of Theorem 2.4.3, and so we need a nice cone. It turns out that there is such a cone, built from the cone of (equivalence classes of) non-negative functions and the integral (considered as a linear functional). The use of this cone was popularized by Liverani in [33].

Example 3.1.19. Let (X, ≤, λ) be a totally ordered, order complete set equipped with its order topology and a complete regular Borel probability measure λ on X. Consider

BV (λ) equipped with k·kBV , and for a > 0 let n o Ca = [f] ∈ BV (λ):[f] ≥ 0, Var[g f] ≤ a k[f]k1 \{[0]}.

Then Ca is a nice cone in (BV (λ), k·kBV ). To see this fact, we will show that Ca is the intersection of two cones, the non-negative cone C+ = {[f] ∈ BV (λ):[f] ≥ 0}\{0} (where we consider the inequality λ-almost everywhere) and the Bishop-Phelps (or “ice cream”) cone CR ,1/(1+a). The non-negative cone is not D-adapted for any D, but that is the only way in which C+ is not nice; on the other hand, CR ,1/(1+a) is nice, and so we apply Lemma 2.1.8 to obtain a nice cone.

First, we see that C+ is a salient blunt closed convex cone: an equivalence class that is both non-negative and non-positive must be zero, the zero vector is not in C+, the inequality condition is preserved by positive scalar multiplication and addition within

C+, and C+ ∪ {0} is closed in BV (λ) since limits also preserve the inequality condition. 1 Then, note that is an interior point of C+: if [g] ∈ BV (λ) has k[g]kBV < 1, then by Lemma 3.1.2 and monotonicity of the integral we have: Z ess sup(|[g]|) ≤ ess inf(|[g]|) + Var[g g] ≤ |[g]| dλ + Var[g g] = k[g]kBV < 1, X and hence 1 − [g] ≥ 0. Thus C+ generates BV (λ). R Note that the integral X : BV (λ) → R is a bounded linear functional: if k[f]kBV ≤ 1, then we have Z

[f] dλ ≤ k[f]k1 ≤ k[f]kBV ≤ 1. X R 1 −1 Moreover, X dλ = 1, and so the norm of the integral is exactly 1. Then, (1 + a) is smaller than 1, so the Bishop-Phelps cone CR ,1/(1+a) is a nice cone, by Example 2.1.6, with D-adapted constant D = 1+a. (Note that the integral is a strictly positive linear functional.)

We now take the intersection of C+ and CR ,1/(1+a), and show that it is equal to R R Ca. Observe that if [f] is an element of C+ ∩ C ,1/(1+a), then k[f]k1 = X [f] dλ. We

76 explicitly write down the condition from CR ,1/(1+a):

Z 1 1  Z  [f] dλ ≥ k[f]kBV = Var[g f] + [f] dλ , X 1 + a 1 + a X which rearranges (after some algebra) gives Z Z [f] dλ ≤ a [f] dλ. X X

Hence, we have C+ ∩ CR ,1/(1+a) = Ca; from this, we obtain every property of a nice cone (including the D-adapted constant D = (1 + a)−1) except the generating property. To see that Ca generates BV (λ), we show that 1 is also an interior point of CR ,1/(1+a) and use Lemma 2.1.4. We compute:

k1k 1   1 Z BV 1 1 1 = Var(g ) + k k1 = < 1 = dλ. 1 + a 1 + a 1 + a X

Thus, again using Example 2.1.6, 1 is an interior point of not just C+ but also CR ,1/(1+a), and hence of Ca. The proof is complete.

Later, in Lemma 4.4.1, we will investigate an upper bound on the projective metric in this cone. The bound is better motivated in the context of our application, so we leave it until Section 4.4.

3.2 Statement and Proof of the Inequality

In order to state the inequality, we must specify what types of maps are handled by the inequality, and we need to define what we mean by “hanging points” (a term borrowed from Eslami and G´ora[12]). As previously mentioned, we follow Rychlik in terms of the types of maps; in Definition 3.2.1, the last hypothesis echoes Rychlik’s setup in [48].

Dynamical Definitions Definition 3.2.1. The map T : X → X satisfies the assumptions M when the follow- ing occur.

• (X, ≤) = ([aX , bX ], ≤) is a totally ordered, order-complete set equipped with its order topology, equipped with a complete regular Borel probability measure λ.

• There are no isolated points in X.

77 • There exists a countable cover of X by closed intervals {In}n∈N with In = [an, bn],

where an < bn. N may be finite or countably infinite.

• (an, bn) ∩ (am, bm) = ∅ for all n 6= m.

• (an, +) and (bn, −) are both one-tailed points of (X, ≤) contained in In with positive measure tails. S • n(an, bn) is dense in X and has measure 1.

• T is continuous and extends to a homeomorphism Tn : In → Tn(In) ⊆ X. (an,bn) • There exists a bounded measurable function g : X → [0, ∞) such that L(f)(x) := P y∈T −1(x) g(y)f(y) defines an operator that preserves λ (that is, λ(L(f)) = λ(f) for all integrable functions f), g has finite variation on each In, and g is 0 at the

endpoints of each In. R We use the shorthand notation λ(f) = X f dλ, and we will suppress N where ap- o propriate. Denote by In the interval (an, bn), which is not necessarily the topological interior of In (and therefore, this notation is somewhat non-standard). The inter- vals In are sometimes called intervals of monotonicity, since T is monotone on each o In by the homeomorphism requirement. Let E = {(an, +), (bn, −) : 1 ≤ n ≤ N} be the collection of one-tailed points located at endpoints of intervals of monotonicity, inside those intervals. Let Kn = Tn(In) be the image of the homeomorphism Tn, with + − Tn(an) = T (an ) and Tn(bn) = T (bn ). Setting g to be equal to 0 at endpoints of intervals of monotonicity simplifies the computations with L while also not actually affecting any of the calculations to be done later; note that there is no requirement that g is continuous at endpoints, so usually the one-sided limits are non-zero. Note that g has one-sided limits at every point, since the In cover X and g has finite variation on each

In.

Lemma 3.2.2. Let X and T satisfy the conditions in M. Setting L[f] := [L(f)] defines an operator L : L1(λ) → L1(λ).

Proof. First, observe that if f : X → R is non-negative and integrable, then L(f) is also non-negative and integrable. Thus if f ≤ g, then L(f) ≤ L(g). We can apply this for f − ≤ f ≤ f +, to show that L(f +) ≥ L(f)+ and L(f −) ≤ L(f)−. 1 ± Now, if f1, f2 ∈ [f] ∈ L (λ), then (f1 − f2) are λ-a.e. zero and so integrate to zero. But then we have

+ + 0 ≤ λ (L(f1) − L(f2)) = λ L(f1 − f2) + + ≤ λ L (f1 − f2) = λ (f1 − f2) = 0,

78 + and similarly for λ (L(f1) − L(f2)) ; hence L(f1) = L(f2) λ-almost everywhere, and hence L[f] := [L(f)] is well-defined in L1(λ) (since the integral is preserved). The class of maps on spaces as in M is closed under composition; a reasonable thing to expect, given that in dynamical systems we repeatedly apply maps to a space. Lemma 3.2.3. Let (X, ≤, λ) be as in Definition 3.2.1 and let T,S be two maps satis- fying M on (X, ≤, λ). Then S ◦ T also satisfies M. Proof. First, the space X is the same. The new intervals are the countable collection −1 of non-empty intersections Kn,m = In ∩ Tn (Jm), where we index over n ∈ N and m ∈ M for the maps T and S, respectively. These sets are intervals because Tn is monotone and they are closed because Tn is continuous and each In,Jm is closed. We o o have Kn,m ∩ Kn0,m0 = ∅ because the same relation holds for the In and Jm separately, o and the union of the Kn,m is still dense with measure 1 for the same reason. The endpoints of each Kn,m are one-tailed and in Kn,m because either they are one-tailed −1 endpoints of In or they are one-tailed endpoints of Jm (hence of Tn (Jm)); the positive measure tails occur for the same reason. Clearly there are still no isolated points in X. o For the map S ◦ T , observe that on Kn,m, we have S ◦ T = Sm ◦ Tn, which is a composition of homeomorphisms on Kn,m, hence extends to Sm ◦Tn on Kn,m. The scale function g is more challenging: set gS◦T = gS ◦ T · gT , and let LS◦T be the operator as in Definition 3.2.1. Note that x = S ◦ T (z) if and only if x = S(y) for some y = T (z). Hence, for f : X → R integrable, we have: X LS◦T (f)(x) = gS(T (z))gT (z)f(z) z∈(S◦T )−1{x} X X = gS(y) gT (z)f(z) = LS(LT (f))(x). y∈S−1{x} z∈T −1{y}

Hence LS◦T = LS ◦LT . Because LS and LT both preserve λ, so does LS◦T . The function gS◦T is bounded and measurable, and is equal to zero at the endpoints of each Kn,m since each Tn is a homeomorphism and gS and gT are equal to zero at endpoints of Jm and In, respectively. Moreover, gS◦T has finite variation on each Kn,m because gS has

finite variation on Jm and gT has finite variation on In, so gS ◦T ·gT has finite variation −1 on In ∩ Tn (Jm). The following lemma is mostly Rychlik’s Remark 2 in [48]; it is necessary because the assumptions made on L do not explicitly give these properties. Stating the situation in this manner also allows us to collect easily-referenced results as well as do some basic computations regarding the operator L. We continue to explicitly deal with equivalence classes of functions. Lemma 3.2.4. Let X and T satisfy the assumptions in M. The operator L : L1(λ) → L1(λ) has the following properties:

79 1. L([f ◦ T ] · [h]) = [f] · L[h] for [f] ∈ L∞(λ) and [h] ∈ L1(λ);

1 −1 1  2. L [ A] = g ◦ Tn · Tn(A) for n ∈ N and A ⊆ In measurable; 3. if [h] ∈ L1(λ)[h] ≥ 0, then L[h] ≥ 0. Moreover, we have: 3. T is non-singular with respect to λ; h i h i 1 1 1 1 1 4. g is almost everywhere non-zero, In g ∈ L (λ) for each n ∈ N, and L In g = 1 [ Kn ] for each n ∈ N;

1 ∞ 5. g is the Jacobian of T , in the sense that for each n and [f] ∈ L (X),

Z Z  1 [f] dλ = f ◦ T · dλ; Kn In g

6. the image of a measure zero set under T is a measure zero set. Finally, L is the Perron-Frobenius operator on L1(λ) corresponding to T . Moreover, it has the following alternative form, for integrable functions f : X → R:

X −1 −1 1 o L(f) = g ◦ Tn f ◦ Tn In , n and for [f] ∈ BV (λ), we have

X  −1  −1 L[f] = FKn g ◦ Tn FKn f ◦ Tn . n

Proof. First, because T has countably many measurable branches, for [f] ∈ L∞(X) and [h] ∈ L1(X) we have [f ◦ T ] ∈ L∞(X) and so [f ◦ T ] · [h] ∈ L1(X). For (λ-almost) any x ∈ X, X X L(f ◦ T · h)(x) = g(y)f(T (y))h(y) = f(x) g(y)h(y) = f(x)L(h)(x), y∈T −1{x} y∈T −1{x} which implies that L([f ◦ T ] · [h]) = [f] · L[h]. Then, for fixed n ∈ N, we have:

( o X 0, x∈ / T (A ∩ In) L(1A)(x) = g(y)1A(y) = ¯−1 o y∈T −1{x} g(Tn (x)), x ∈ T (A ∩ In) −1 1 = g(Tn (x)) Kn (x), using the fact that g is zero at the endpoints of In; take equivalence classes. To see that L takes non-negative equivalence classes to non-negative equivalence classes, let

80 1 [h] ∈ L (λ) and choose h1 ∈ [h] so that h1 ≥ 0 everywhere. Then L(h1)(x) is the sum of non-negative terms (since g and h1 are both non-negative), hence L(h1) is pointwise non-negative. Then L[h] has a non-negative representative, hence is a non-negative equivalence class. Next, to see that T is non-singular, let A ⊆ X have measure 0, and let n ∈ N. We compute:

−1 1 1 1 1 λ(T (A) ∩ In) = λ( A ◦ T · In ) = λ(L( A ◦ T · In )) Z 1 1 1 −1 1 = λ( A · L( In )) = Ag ◦ Tn · Kn dλ X Z = g dλ = 0, A∩Kn since λ(A ∩ Kn) = 0. Then, since ! −1 −1 [ X −1 λ(T (A)) = λ T (A) ∩ In ≤ λ(T (A) ∩ In) = 0, n∈N n∈N we see that T is non-singular. It is easy to see that g is almost everywhere non-zero, because X L(1g−1{0})(x) = g(y)1g−1{0}(y) = 0, y∈T −1{x} and so −1 λ(g {0}) = λ(1g−1{0}) = λ(L(1g−1{0})) = λ(0) = 0.

Then, fix n ∈ N. The extension Tn is non-singular as a map from In to Kn because

T is non-singular, so the measure (Tn)∗λ K is absolutely continuous with respect to n  λ . This means that there exists [h] ∈ L1 λ such that [h] ≥ 0 and Kn Kn Z Z [f] ◦ Tn dλ = [f] · [h] dλ In Kn

∞ for all [f] ∈ L (Kn) ([h] is the appropriate Radon-Nikodym derivative). Since Tn and −1 Tn are homeomorphisms and λ is a Borel measure, B ⊆ Kn is measurable if and only if B = Tn(A) for some measurable set A ⊆ In. For any such A, we write: Z Z Z 1 1 −1 [h] dλ = dλ = λ( A) = λ(L( A)) = g ◦ Tn dλ, T (A) A Tn(A)

−1 1 which shows that on Kn, h = g ◦ Tn λ-almost everywhere. Now, set vm = g ·

81 1 1g−1[m−1,∞); we have vm −→ pointwise from below, so by monotone convergence m→∞ g and the properties of g and h above, we compute:

Z 1 Z Z 1 dλ = lim vm dλ = lim dλ m→∞ m→∞ −1 −1 In g In g [m ,∞) g Z 1 Z = lim −1 · h dλ = lim 1 dλ ≤ 1. m→∞ −1 −1 m→∞ −1 −1 −1 Tn(g [m ,∞)) g ◦ Tn (g◦Tn ) [m ,∞)

1 1 This shows that In g is integrable. So, for n ∈ N and x not an endpoint of Kn, we have:

 1 X 1 X L 1 · (x) = g(y)1 (y) = 1 (y) = 1 (x), In g In g(y) In Kn y∈T −1{x} y∈T −1{x} since the In form a partition of X where the only overlap is at endpoints. Let n ∈ N and [f] ∈ L∞(X). Using the above relations, we get: Z    1 1 1 [f] dλ = λ([f · Kn ]) = λ [f] · L In · Kn g   1   1 = λ L f ◦ T · 1 · = λ [f ◦ T ] · 1 · In g In g Z 1 = [f ◦ T ] · dλ. In g

1 This shows that g is the Jacobian of T (where reasonable). Let A ⊆ X be a measure zero set. Then T (A ∩ In) = Tn(A ∩ In) ⊆ Kn for each n, so the measure of T (A ∩ In) is given by: Z Z 1 1 1 λ(T (A ∩ In)) = Tn(A∩In) dλ = Tn(A∩In) ◦ Tn · dλ Kn In g Z 1 Z 1 = 1A dλ = dλ = 0, In g A∩In g since A ∩ In has measure zero. Then by subadditivity, we have: !! ! [ [ X λ(T (A)) = λ T A ∩ In = λ T (A ∩ In) ≤ λ(T (A ∩ In)) = 0. n n n∈N

To see that L is actually the Perron-Frobenius operator associated to T , it’s enough to show that L satisfies the duality relation with the Koopman operator · ◦ T . So let [w] ∈ L1(X) and [f] ∈ L∞(X). By the first property of L and the fact that L preserves

82 integrals, we get:

λ([f] · L[w]) = λ(L[f ◦ T · w]) = λ([f ◦ T ] · [w]), which is exactly the duality relation. o Finally, let x ∈ X. If T (y) = x but y∈ / In for some n, then g(y) = 0 and we ignore it. Then we have, summing over the branches of T :

X X −1 −1 1 o L(f)(x) = g(y)f(y) = g(Tn (x))f(Tn (x)) T (In)(x). y∈T −1{x} n

For [f] ∈ BV (λ), we still have L(f) as just described, and taking equivalence classes yields X  −1   −1 1 o L[f] = g ◦ Tn T (In) f ◦ Tn . n −1 −1 1 o Since on Kn, g ◦ Tn T (In) = g ◦ Tn , we have

 −1  −1 1 o FKn g ◦ Tn T (In) = FKn [g ◦ Tn ], and so substitute into the previous equality to obtain

X  −1  −1 L[f] = FKn g ◦ Tn FKn f ◦ Tn . n

The proof is complete.

The main application of our inequality will be to a family of piecewise linear maps, hence to piecewise C1 maps. As an example, we will see that piecewise C1 maps on intervals (with countably many branches and a variation condition on the branches) satisfy M.

Example 3.2.5. Let X = [a, b] ⊆ R, let λ be normalized Lebesgue measure on X, and let T : X → X be a piecewise C1 map with piecewise finite variation derivatives. We will take this to mean that there is a countable collection of disjoint intervals (an, bn) ⊆ 1 S X such that T extends to a monotone C map Tn on [an, bn], U = (an, bn) (an,bn) n is dense and of full measure in X, inf {|T 0(x)| : x ∈ U} > 0, and that T 0 has finite 0 variation on each [an, bn]. In particular, each Tn is continuous on [an, bn]. To see that this X and T fit the situation in M, observe first that the space X along with the usual order on R makes (X, ≤) into a totally ordered, order complete space, and that normalized Lebesgue measure is a complete regular Borel probability measure on X. The space X has no isolated points. Set In = [an, bn]; by choice of o o S o an and bn, In ∩ Im = ∅ for n 6= m, and U = n In is dense with full measure. Each (an, +) and (bn, −) is a one-tailed point in In with positive measure tail, since an < bn

83 in R, R is a linear continuum, and Lebesgue measure assigns positive measure to all open intervals. To see that the maps Tn are homeomorphisms, note that the derivative can never be 0, which forces Tn to be either increasing or decreasing on each interval, hence one-to-one. Since each Tn is a continuous injection of a compact interval into a

Hausdorff space, each Tn is a homeomorphism onto its image, by a general theorem. Finally, let g(x) = |T 0(x)|−1 for x ∈ U and g(x) = 0 otherwise. Then g is measurable 0 −1 and zero at the endpoints of each In, and g(x) ≤ (inf {|T (x)| : x ∈ U}) < ∞ 0 −1 uniformly, so g is bounded with kgk∞ = (inf {|T (x)| : x ∈ U}) . We compute an upper bound for the variation of g on [an, bn] by definition; for any an ≤ x1 < ··· < xk ≤ bn, we have:

k k X X 1 |g(x ) − g(x )| = ||T 0(x )| − |T 0(x )|| i i−1 |T 0(x )| |T 0(x )| i i−1 i=1 i=1 i i−1 k X 1 = |T 0(x ) − T 0(x )| , |T 0(x )| |T 0(x )| i i−1 i=1 i i−1

0 where we observe that T has the same sign across [an, bn] because it is continuous and its absolute value is bounded away from zero. Since T 0 is assumed to have finite 0 variation on [an, bn] and |T | is bounded below, we obtain

2 0 Var[an,bn](g) ≤ kgk∞ Var[an,bn](T ) < ∞.

For integrable functions f, define L(f) as above. Note that the pointwise formula given at the end of Lemma 3.2.4 is valid without checking anything else, so L(f) is the sum of measurable functions, hence measurable. The absolute value on R is countably subadditive, so pointwise we have |L(f)| ≤ L(|f|). By Tonelli’s theorem we may interchange the integral and the summation, and so we obtain: Z Z X −1 −1 1 |L(f)| dλ ≤ g ◦ Tn · f ◦ Tn · Tn(an,bn) dλ X X n∈N Z X 1 −1 = · f ◦ T dλ |T 0 | ◦ T −1 n n∈N Kn n n X Z = |f| dλ = kfk1 < ∞. n∈N In

Hence L(f) is integrable, and so the summation really does converge λ-almost every- where. The same argument using Fubini’s theorem shows that λ(L(f)) = λ(f): Z Z Z Z X −1 −1 X L(f) dλ = g ◦ Tn f ◦ Tn dλ = f dλ = f dλ. X n∈N Kn n∈N In X

84 This shows that the assumptions in M are met.

We absolutely require the restriction that T 0 has finite variation on each interval 0 −1 [an, bn] in the previous example for g = |T | to have finite variation; see Example A.3.1.

4 1 Example 3.2.6. Let T : [0, 1] → [0, 1] be given by T (x) = 3 x + 3 mod 1, and equip [0, 1] with its usual topology and Lebesgue measure. The map T is piecewise linear, 3 and we see that g(x) = 4 except for x ∈ {0, 1/2, 1}, where g(x) is zero, so it fits into Example 3.2.5. Let f : [0, 1] → R be given by ( 1 + x2, x ≤ 1 , f(x) = 2 2 1 1 − x, x > 2 .

The map T and the function f look as in Figures 3.2(a) and 3.2(b), respectively.

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

(a) The map T . (b) The function f.

Figure 3.2: The setup in Example 3.2.6.

To compute L(f) explicitly, we note that I1 = [0, 1/2], I2 = [1/2, 1], K1 = [1/3, 1], 4 1 and K2 = [0, 2/3]. The map T1 is given by T1(x) = 3 x + 3 for x ∈ [0, 1/2] and has −1 3 1 4 2 inverse T1 (x) = 4 x − 4 on domain [1/3, 1]. The map T2 is given by T2(x) = 3 x − 3 −1 3 1 for x ∈ [1/2, 1] and has inverse T2 (x) = 4 x + 2 with domain [0, 2/3]. We have

−1 −1 −1 −1 L(f) = FK1 g ◦ T1 FK1 f ◦ T1 + FK2 g ◦ T2 FK2 f ◦ T2 3 = F f ◦ T −1 + F f ◦ T −1 , 4 K1 1 K2 2 because g has limit 3/4 everywhere. −1 Let us compute FK1 f ◦ T1 (x). For x ∈ [0, 1/3), this is 0, by definition of FK1 .

85 For x ∈ (1/3, 1), this is equal to

3 1 1 3 12 f(T −1(x)) = f x − = + x − , 1 4 4 2 4 4

−1 −1 because T1 (x) ∈ (0, 1/2). For x ∈ {1/3, 1}, we take the limit of the function f(T1 ) 1 3 1 2 from inside (1/3, 1), and so we simply get 2 + 4 x − 4 because that function is continuous. Thus we may write ! 3 1 3 12 F f ◦ T −1 (x) = + x − · 1 (x). K1 1 4 2 4 4 [1/3,1]

This function is the blue graph in Figure 3.3(a). Graphically, we have taken f on [0, 1/2], translated it and stretched it across [1/3, 1] according to the action of T , and scaled by g.

Similarly for K2, we have

3 1 3  F f ◦ T −1 (x) = − x · 1 (x). K2 2 4 2 4 [0,2/3]

This function is the pink graph in Figure 3.3(a). We then see that L(f) is given by the function depicted in Figure 3.3(b), by adding the two pieces together.

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1

−1 (a) The pieces FKn f ◦ Tn . (b) The function L(f). Figure 3.3: Computing L(f) using Lemma 3.2.4.

The next terminology comes from Eslami and G´ora[12], where it was used infor- mally. The terminology will help to clarify the proposition and its proof.

Definition 3.2.7. Let X[aX , bX ] and T be as in M, and let (x, s) ∈ E. We say that s 1 s (x, s) is a hanging point for T when T (x ) ∈/ {aX , bX }, i.e. {aX ,bX }(T (x )) = 0. Denote the set of all hanging points of T by H.

86 1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 0 0.2 0.4 0.6 0.8 1

Figure 3.4: A map T : [0, 1] → [0, 1] with four hanging points, namely: (0.4, −), (0.4, +), (0.6, +), and (1, −).

Example 3.2.8. Consider the map T : [0, 1] → [0, 1] in Figure 3.4. Here, we have

E = {(0, +), (0.4, −), (0.4, +), (0.6, −), (0.6, +), (1, −)} .

To find the hanging points, we look for the elements (x, s) of E for which T (xs) ∈/ {0, 1}. In pictures, this means that the endpoint of a branch doesn’t meet the endpoints of [0, 1]. These points are then

H = {(0, +), (0.4, +), (0.6, +), (1, −)} .

Example 3.2.9. The map in Figure 1.2 has four hanging points: (−0.5, −), (−0.5, +), (0.5, −), and (0.5, +). Here, X = [−1, 1], and we have that

E = {(−1, +), (−0.5, −), (−0.5, +), (0, −), (0, +), (0.5, −), (0.5+), (1, −)} .

For the one-tailed points (x, s) ∈ E where x ∈ {−1, 0, 1}, we have T (xs) ∈ {−1, 1}, and in the other four cases we have T (xs) ∈ {−1, 1}, again by simply looking at the picture. Note that T is continuous at ±0.5 (where there are hanging points), whereas it is not continuous at 0 (where there are no hanging points). Jumps or lack thereof do not affect whether or not a one-tailed point is hanging.

87 The Balanced Lasota-Yorke-type Inequality We can now state our new inequality. Proposition 3.2.10. Let X, λ, and T satisfy the assumptions in M, and let H be the set of hanging points for T . Suppose further that   VarIo (g) X sup n < ∞, and g(zs) < ∞. n∈N λ(In) (z,s)∈H

M Then for any [f] ∈ BV (λ) and any finite collection of closed intervals J = {Jm}m=1 SM with disjoint non-empty interiors such that m=1 Jm contains H, we have:    n o Var L[f] ≤ sup g + Var o (g) + max{h (m)} Var[f] g Io In J g n n ∞ m

  o    VarIn (g) hJ (m) + sup + max k[f]k1 , n λ(In) m λ(Jm)

X s where hJ (m) := g(z ).

(z,s)∈H∩Jm o o Proof. Recall that Kn = Tn(In) and Kn = T (In); let Kn = [cn, dn]. By Lemmas 3.1.15 and 3.2.4, we know that for [f] ∈ BV (λ), we have:

 X  −1 −1 Varg L[f] ≤ Varg FKn g ◦ Tn · f ◦ Tn . n

Each of the variation terms in the summation breaks into three parts: the variation for the interior and the two endpoint terms, using Lemma 3.1.10. Suppose that for n, Tn 1 1 is increasing. Then Tn(an) = cn and Tn(bn) = dn, and the 1 − {aX }(cn), 1 − {bX }(dn) terms are non-zero exactly when (an, +), (bn, −) are hanging points for T . Thus we have, using Lemma 3.1.8:

−1 + −1 + 1 [g] ◦ Tn (cn ) [f] ◦ Tn (cn ) · (1 − {aX }(cn)) −1 − −1 − 1 + [g] ◦ Tn (dn ) [f] ◦ Tn (dn ) · (1 − {bX }(dn)) + + 1 − − 1 = [g](an ) [f](an ) · H (an, +) + [g](bn ) [f](bn ) · H (bn, −).

The case where Tn is decreasing yields the same equation. For the interior term, since

Tn is either increasing or decreasing, we have that

−1 −1 o o VarT (In) g ◦ Tn · f ◦ Tn = VarIn (g · f).

−1 −1 o o Taking an infimum over elements of [g · f] yields VargIn [g · f] ≥ VargKn [g ◦ Tn · f ◦ Tn ], −1 −1 and taking an infimum over elements of [g ◦Tn ·f ◦Tn ] yields the opposite inequality,

88 which shows −1 −1 o o VargIn [g · f] = VargKn [g ◦ Tn · f ◦ Tn ].

Now, in the case that neither {cn} nor {dn} have positive λ measure, by putting these estimates together and using Lemma 3.1.10 we have

 X + + − − o 1 1 Varg L[f] ≤ VargIn [g · f] + g(an ) f(an ) · H (an, +) + g(bn ) f(bn ) · H (bn, −). n

−1 −1 If {cn} has positive measure, say, then note that the value that FKn (g ◦ Tn · f ◦ Tn ) + + takes at cn is the one-sided limit g(an )[f](an ) (assuming for simplicity that Tn is increasing here), and so the equivalence class also takes that value there. In the case where cn 6= aX , we have some other cm < cn and cm has a positive measure −1 −1 tail on which FKn (g ◦ Tn · f ◦ Tn ) is zero, and so the contribution to the variation −1 −1 + + of [FKn (g ◦ Tn · f ◦ Tn )] is minimized by g(an ) |[f](an )| (attained by the function −1 −1 FKn (g ◦ Tn · f ◦ Tn )). The analogous argument works for Tn decreasing or dn. For the sum of the variation terms, we use the product inequality in Lemma 3.1.3 and the estimates on the essential supremum in Lemma 3.1.2 to see that (using plain

o o variation for the actual function g, since VargIn [g] ≤ VarIn (g))

VargIo [g · f] ≤ g o VargIo [f] + f o VarIo (g) n In ∞ n In ∞ n  Z  1 ≤ g Var o [f] + Var o [f] + |[f]| dλ Var o (g) Io gIn gIn o In n ∞ λ(I ) o n In o Z   VarIn (g) = g + Var o (g) · Var o [f] + |[f]| dλ. Io In gIn o n ∞ λ(I ) o n In

o We have assumed that the ratio of the variation of g on intervals In over the measure o of In is bounded in n, so we obtain

n o Var o (g) Z X X In VargIo [g·f] ≤ sup g o + VarIo (g) VargIo [f]+sup |[f]| dλ. n In ∞ n n λ(Io) n n∈N n n∈N n X

For the sum of the hanging point terms, we see that it is simply a sum over all of the hanging points for T . Using Corollary 3.1.11, for a hanging point (x, s) in the interval Jm we obtain Z s 1 |[f](x )| ≤ |[f]| dλ + VargJm [f]. λ(Jm) Jm

89 For H the set of hanging points, we have

X − − 1 + + 1  [g](bn ) [f](bn ) · H (bn, +) +[g](an ) [f](an ) · H (an, +) n∈N M X X = [g](xs) |[f](xs)| .

m=1 (x,s)∈H∩Jm

We then bound each |f(xs)| term above by the estimate using the containing interval, and obtain:

M X X [g](xs) |[f](xs)|

m=1 (x,s)∈H∩Jm     M Z M X 1 X s X X s ≤  [g](x ) |[f]| dλ +  g(x ) VargJm [f] λ(Jm) Jm m=1 (x,s)∈H∩Jm m=1 (x,s)∈H∩Jm   hJ (m) ≤ max k[f]k1 + max {hJ (m)} Var[g f], m λ(Jm) m

s where we use the notation hJ (m) for the sum of the g(x ) terms over (x, s) in Jm, and at the end we bounded the sum of the VargJm [f] terms by Var[g f] thanks to Lemma 3.1.10. The proposition statement follows by combining these two upper bounds.

Remark 3.2.11. Now that we have proven Proposition 3.2.10, we will have no further need to actually work with specific bounded variation functions or elements of BV (λ). Because of this lack of need, we will begin to abuse notation and conflate a bounded variation function f : X → R and its equivalence class [f] ∈ BV (λ). If the distinction matters in a proof, the distinction will be made apparent.

90 Chapter 4 Application to Cocycles of Perron- Frobenius Operators

Let (Ω, τ) be homeomorphic to a Borel subset of a Polish space, µ a Borel probability measure on (Ω, σ(τ)), and σ :Ω → Ω a homeomorphism so that (Ω, µ, σ) is a ergodic, invertible, probability-preserving transformation. The goal of this chapter is to use Theorem 2.4.3 to study random dynamical systems based on the following class of maps.

Definition 4.0.1. Let 1, 1 ∈ [0, 1] be given. Let T1,2 :[−1, 1] → [−1, 1] be given by:  2(1 +  )(x + 1) − 1, x ∈ [−1, −1/2],  1  −2(1 + 1)x − 1, x ∈ [−1/2, 0),  T1,2 (x) = 0, x = 0,  −2(1 +  )x + 1, x ∈ (0, 1/2],  2  2(1 + 2)(x − 1) + 1, x ∈ [1/2, 1].

The map T1,2 will be called a paired tent map. The name “paired tent map” arises because there are two tent maps paired together to create a single map. See Figure 4.1 for an example. Note that for all 1, 2, T1,2 is a non-singular map with respect to normalized Lebesgue measure λ.

Fix measurable functions 1, 2 :Ω → [0, 1]. For each ω ∈ Ω let Tω := T1(ω),2(ω) and let Lω be the Perron-Frobenius operator corresponding to Tω. This generates a (n) random dynamical system: a cocycle of maps Tω and the associated cocycle of Perron- (n) Frobenius operators Lω over the base timing system (Ω, B, µ, σ). For shorthand, we (n) will write Tω for the system, since the base timing system is given. Since each Tω is piecewise linear, every map fits in the situation of Example 3.2.5, and so each of the maps Tω and operators Lω satisfy the conditions M from Section 3.2. As outlined in the Introduction, we are interested in the existence of equivariant (n) densities for Tω and lower bounds on the mixing rate (if there is one). If we can apply

91 1

0.5

0

-0.5

-1 -1 -0.5 0 0.5 1

Figure 4.1: The paired tent map, with parameters 1 = 0.3 and 2 = 0.7.

Theorem 2.4.3 to this situation with explicit estimates on the quantities kP , µ(GP ), and DP in the bound, then we can provide an answer to those questions. In particular, we will prove Theorem B and C (stated more precisely as Theorem 4.5.1 and Theorem 4.6.1) by applying Theorem 2.4.3 to a cocycle of Perron-Frobenius operators on the

Banach space BV (λ) with a nice cone Ca from Example 3.1.19.

Remark 4.0.2. Given the form of Ca, with the condition defined by an inequality involving the variation and the integral, it seems reasonable that we could use a Lasota-

Yorke-type inequality to find a such that Lω preserves Ca for all ω. To wit, suppose that Lω satisfies the inequality

Var(Lω(f)) ≤ CVar Var(f) + C1 kfk1 uniformly in ω. For f ∈ Ca, we have:

Var(Lω(f)) ≤ CVar Var(f) + C1 kfk1

≤ CVara kfk1 + C1 kfk1 = (CVara + C1) kfk1 .

Then, we try to find a > 0 to solve CVara + C1 ≤ a, which formally rearranges to C 1 ≤ a. 1 − CVar

92 As long as CVar < 1, the quantity a may be chosen to be the minimum value C1(1 − −1 CVar) .

In fact, we can do better than simply finding that LωCa ⊆ Ca for all ω for the above a; when CVar < 1 we can pick ν ∈ (CVar, 1) and exactly as above show that LωCa ⊆ Cνa for all ω, simply by instead solving CVara + C1 ≤ νa for a. In this case, we have C 1 ≤ a, ν − CVar so we choose a to be the minimum value. This improved containment alone is not enough to show contraction of the cone, but in Section 4.5 we will use it in conjunction with other properties to show that the cone is contracted. However, we cannot simply apply any of the standard Lasota-Yorke inequalities to the maps Tω to find a cone. To see why, consider the case where 1 = 0 on a set 0 of positive measure, and note that |Tω(x)| = 2 for x ∈ (−1, −0.5) ∪ (−0.5, 0). That 1 means the scaling function gω(x) is constant with the value 2 on that region. Using the classical Lasota-Yorke inequality from [29], we would obtain

2  2 1 Var(Lω(f)) ≤ Var(f) + 0 + 1 · kfk1 = Var(f) + 4 kfk1 . 2 4 2 This is ill-suited to our purposes, because the variation coefficient is not strictly less than 1. The inequality used in Rychlik [48] is also ill-suited, for the same reason: the variation coefficient is larger than 1. Finally, our new inequality, Proposition 3.2.10, may also not be used, because the variation coefficient is

n o sup g + Var o (g) + max {h (m)} Io In J n∈N n ∞ m 1  1 ≥ sup + 0 + = 1, n∈N 2 2

1 where we note that the largest hJ (m) must be at least 2 , since there are two hanging ± 1 points in [−1, 0], (−0.5, −) and (−0.5, +), and we have g(0.5 ) = 2 . Any choice of J must have one interval with at least one of those hanging points. While it is possible to choose J such that the constant attains the value 1, it cannot be brought lower, and hence this inequality fails to be useful in this situation.

In response to Remark 4.0.2, we do the cocycle analogue of taking powers of the (2) map to increase the expansion factor: we take the second iterate of the cocycle, Lω =

Lσ(ω)◦Lω, and use the inequality with the new maps instead. Unlike in the deterministic case (as in [29, 33] and others), there is a subtlety to this approach unique to the random case: σ2 may no longer be ergodic on Ω. The workaround is Lemma 4.1.1, which says that even if σ2 is not ergodic on Ω, we may easily find an invariant subset of positive

93 measure on which it is ergodic (with respect to the normalized restricted measure). We then use Proposition 3.2.10 to find a Lasota-Yorke inequality that does allow the second iterate to preserve one of the Ca cones, uniformly in ω. After finding a candidate cone for use in Theorem 2.4.3, we will utilize the notion of “covering”, following both Liverani [33] and Buzzi [8], and some combinatorial argu- ments to explicitly identify the constants in the theorem statement and prove Theorem 4.5.1. Tweaking the combinatorics allows for different contraction estimates, which al- lows us to investigate what happens when we scale the 1, 2 parameters by κ and shrink κ to zero, imitating a perturbation of the map T0,0; the result is Theorem 4.6.1, proven in Section 4.6. In all cases, we use Lemma 4.1.2 to translate our results from the second-iterate cocycle to the original cocycle of operators Lω.

4.1 Cyclic Decomposition

As mentioned in the introduction to this chapter, we want to consider the second- iterate cocycle instead of the first-iterate, but to do so we need to look at the map σ2, a power of an ergodic map. For this purpose, we state and prove the following lemma about general powers of ergodic maps.

Lemma 4.1.1. Let (Ω, µ, σ) be an ergodic probability-preserving transformation, and let k ∈ Z≥1. Then there exists an integer l dividing k and A ⊆ Ω measurable such that 1 −i l−1 −l k µ(A) = l , the sets {σ (A)}i=0 are disjoint, A = σ (A), and (A, µ A, σ ) is ergodic. Moreover, if σ is invertible, so is σk on A.

The general point of this lemma is to illustrate the structure of powers of an ergodic map; the finest ergodic decompositions have a maximum number of ergodic components with positive measure and the components map to each other cyclically via the original map. Moreover, the number of components must be a divisor of the power.

 −k Proof. Let Mσ = µ(A): A = σ (A),A ⊆ Ω, µ(A) > 0 . Mσ is non-empty be- −k −k cause X = σ (X) has positive measure, so 1 = µ(X) ∈ Mσ. For any A = σ (A) k−1 with µ(A) > 0, we may set B = S σ−i(A) and observe that i=0

k−1 k−1 [ [ σ−1(B) = σ−i+1(A) = σ−k(A) ∪ σ−i(A) = B, i=0 i=1 so that µ(B) = 1 (by ergodicity of σ and the fact that µ(B) > µ(A) > 0). Subadditivity of µ implies that k−1 X 1 = µ(B) ≤ µ(σ−i(A)) = kµ(A), i=0

94 1 1 which yields µ(A) ≥ k . Hence inf(Mσ) ≥ k . −k 1 Now, by definition, we may find A ⊆ Ω with A = σ (A) and k ≤ µ(A) < 1 inf(Mσ) + k . Let l = min j ≥ 1 : µ(σ−j(A) ∩ A) > 0 . By our choice of A, it is clear that l ≤ k and hence is finite. We then see that

σ−k(σ−l(A) ∩ A) = σ−l(σ−k(A)) ∩ σ−k(A) = σ−l(A) ∩ A, so that σ−l(A) ∩ A is σk-invariant. Since this set also has positive measure, we have −l −l 1 k µ(σ (A)∩A) ∈ Mσ, hence, µ(σ (A)∩A) ≥ k . Moreover, if two sets are σ -invariant, then their difference is as well. If A \ σ−l(A) has positive measure, then this measure 1 lies in Mσ and must be at least k , which gives: 1 1 1 ≤ µ(A \ σ−l(A)) = µ(A) − µ(σ−l(A) ∩ A) < inf(M ) + − inf(M ) = . k σ k σ k

We have obtained a contradiction, so we see that A = σ−l(A) ∩ A µ-a.e., hence A = σl(A) µ-a.e. l−1 We can then write C = S σ−1(A) and note that σ−1(C) = C almost everywhere, i=0 since σ−l(A) = A almost everywhere. C has positive measure, so we have µ(C) = 1 by ergodicity of σ, and this fact, together with the fact that each of the sets σ−i(A) are µ-a.e. disjoint for i = 0, . . . , l − 1, gives us:

l−1 X 1 = µ(C) = µ(σ−i(A)) = lµ(A) ≤ 1. i=0

1 Hence µ(A) = l . Next, write k = ml + r, for some non-negative integers m and r < l. Note that since A = σ−l(A) µ-a.e., we also have that A = σ−il(A) µ-a.e. Then, if r > 0, we get:

0 = µ(A ∩ σ−r(A)) = µ σ−k(A) ∩ σ−r(A) = µ σ−r σ−ml(A) ∩ A = µ σ−ml(A) ∩ A , which is a contradiction. Thus r = 0, and l|k. m−1 Let A˜ = T σ−il(A). This set is σl-invariant (since l|k), hence σk-invariant also, i=0 and has the same measure as A (as every set is almost everywhere equal to A). We may now replace A with A˜ without losing any of the properties of A. Doing so, we gain l −i l−1 σ -invariance of A, and {σ (A)}i=0 becomes a collection of mutually disjoint sets. −k Now, suppose that there exists A1 and A2, with σ (Aj) = Aj and 0 < µ(Aj) < 1 1 1 inf(Mσ) + . By the above work, we may assume µ(A1) = < = µ(A2) (where k l1 l2

95 m1l1 = k = m2l2). This fact gives us

1 1 1 inf(Mσ) ≤ < < inf(Mσ) + , l1 l2 k which leads to

1 m2 m1 1 1 1 1 (m2 − m1) = − = − < inf(Mσ) + − inf(Mσ) = . k k k l2 l1 k k

1 1 Of course, since l2 < l1, we have m2 − m1 ≥ 1, and so we obtain k < k , also a contradiction. The assumption made was that A1 and A2 have different measures, so −k we must instead have that whenever A1 and A2 satisfy σ (Aj) = Aj and 0 < µ(Aj) < 1 1 inf(Mσ) + k , they must have the same measure. Then inf(Mσ) = l for some unique −l −i l−1 l|k, and there exists some A ⊆ Ω with σ (A) = A, {σ (A)}i=0 mutually disjoint, and 1 µ(A) = l . To see that σk is ergodic on A, let B ⊆ A be σk-invariant with positive measure.

Then µ(B) ∈ Mσ, and so we get

1 1 = inf(M ) ≤ µ(B) ≤ µ(A) = . l σ l

Therefore σk is ergodic on A. Finally, if σ is invertible, then σk must be also, so σk must be invertible on A by restriction.

In our situation, we will have k = 2, and so either σ2 is ergodic on Ω or it is ergodic on a set Ω1 with µ(Ω1) = 1/2 and its complement Ω2 satisfies σ(Ω1) = Ω2. This plays a role later, because as we will see, the expansion factors of the second iterate depend on both the i(ω) and i(σ(ω)) terms. We will then be able to recover inequalities for the second-largest Lyapunov exponent for the original cocycle from those for the second 2 iterate cocycle. From here on, let Ω1 be a set on which σ is ergodic (potentially all of 2 Ω, but possibly only half of the space) and let Ω2 either be Ω if σ is ergodic or Ω \ Ω1 2 if σ is only ergodic on Ω1. Let µ1 be the normalized restriction of µ to Ω1. The following lemma describes how to take our information from the k-fold interate cocycle and translate it to information about the original cocycle. It is a bit more general than we need, but it illustrates some of the issues when working with cocycles.

(n) Lemma 4.1.2. Consider a cocycle of bounded linear operators L = Lω on (X, k·k) (k) k over an invertible σ :Ω → Ω. Let Pω = Lω be the k-th iterate cocycle over σ , and 1 suppose that for all ω and 0 ≤ r ≤ k−1, lim supn→∞ n log Lσkn+r(ω) = 0. Then for all ω ∈ Ω and x ∈ X, we have λP (ω, x) = kλL(ω, x), and so the map ·k :ΛL(ω) → ΛP (ω) is a bijection, i.e. the Lyapunov exponents of L are just the Lyapunov exponents of P

96 1 scaled by k . The hypotheses hold on a σ-invariant set of full measure in Ω if (Ω, µ, σ) is + an invertible ergodic probability-preserving transformation and log kLωk is integrable.

Proof. Let ω ∈ Ω and x ∈ X. For each n ≥ k, denote jn = bn/kc, rn = n − kjn. We use the submultiplicativity of the operator norm to relate λL(ω, x) to λP (ω, x):

1 (n) λ(ω, x) = lim sup log Lω x n→∞ n 1 kj 1   n (rn) (jn) = lim sup log Lσkjn (ω)Pω x k n→∞ kjn + rn jn k−1 ! 1 X 1 1 (jn) ≤ lim sup log Lσkjn+r(ω) + log Pω x k n→∞ j j r=0 n n 1 1 (j) 1 = lim sup log Pω x = λP (ω, x), k j→∞ j k where in the second-last equality we used the hypothesis on log Lσkn+r(ω) . Then, for this ω and x, observe that we also have:

1 (n) 1 (kn) λP (ω, x) = lim sup log Pω x = k lim sup log Lω x n→∞ n n→∞ kn 1 (n) ≤ k lim sup log Lω x = kλL(ω, x). n→∞ n

Thus we see that λP (ω, x) = kλL(ω, x), and so ·k is a surjective map from ΛL(ω) into

ΛP (ω). That ·k is one-to-one follows from the fact that it is (strictly) order-preserving on R, hence · is a bijection. Finally, if (Ω, µ, σ) is an invertible ergodic probability-preserving transformation + and log kLωk is integrable, then we apply Lemma A.1.2 (which is an expanded version of Proposition A.5(i) from [50]) together with Lemma 4.1.1 to see that on each of the k smaller sets where σk is ergodic, we have

1 lim sup log Lσr((σk)n(ω)) = 0, n→∞ n hence the same is true for µ-almost every ω in Ω.

4.2 Uniform Lasota-Yorke Inequality

(2) 2 (2) Now that we see that Lω is a cocycle over (Ω1, µ1, σ ), we can try show that Lω preserves a cone for all ω; our tool will be the following Lasota-Yorke-type inequality (2) arising from Proposition 3.2.10 applied to Tω . It is straightforward to see that Sω := (2) Tω is again piecewise linear, as the composition of two piecewise linear maps, and

97 has finitely many branches. Thus Sω on [−1, 1] equipped with normalized Lebesgue measure satisfies the conditions M; examples of Sω can be found in Figures 4.2 and 4.3.

1

0.75

0.5

0.25

0

-0.25

-0.5

-0.75

-1 -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

Figure 4.2: The second iterate of the coupled tent map, Sω, with parameters 1(ω) = 0.1, 2(ω) = 0.2, 1(σ(ω)) = 0.1, and 2(σ(ω)) = 0.2.

For different ω, Sω can look somewhat different, as shown in Figures 4.2 and 4.3. These differences turn out to be very important for our analysis, due to the differing numbers of branches and hanging points, and the measures of the branches. Previous Lasota-Yorke-type inequalities involve terms with division by the smallest measure of a branch or the smallest measure in some other finite partition of X (see the introduction to Chapter 3). When we are considering only one map, these terms are fine, but when we are considering infinitely many maps, and in particular families of maps where the measures of the intervals can tend to zero, these terms fail to be useful uniformly in ω. This is why the balanced (and looser) Lasota-Yorke-type inequality in Proposition 3.2.10 becomes useful: we may find a common inequality for all of these maps, regardless of the exact form of the maps, by appropriate choice of intervals

Jm for each subcollection of maps. In particular, through our choices, the variation term becomes suboptimal (compared to other Lasota-Yorke-type inequalities), but the norm term becomes bounded uniformly. This uniformity is the primary reason for proving Proposition 3.2.10. The resulting inequality is the content of the following key proposition.

98 1

0.75

0.5

0.25

0

-0.25

-0.5

-0.75

-1 -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

Figure 4.3: The second iterate of the coupled tent map, Sω, with parameters 1(ω) = 0.7, 2(ω) = 0.3, 1(σ(ω)) = 0.2, and 2(σ(ω)) = 0.6.

Proposition 4.2.1. For any paired tent map cocyle Tω over σ and associated second- iterate P-F operator Pω, we have that for any f ∈ BV (λ):

3 Var(P (f)) ≤ Var(f) + 6 kfk . ω 4 1

If moreover 1, 2 ≤ 1/2 on Ω, then we have the sharper estimate

1 Var(P (f)) ≤ Var(f) + 4 kfk . ω 2 1

Proof. The proof of Proposition 4.2.1 involves computing the function gω for Pω and then, in a number of cases, choosing the intervals for J in Proposition 3.2.10 according to the specific structure of the map to yield at worst the inequalities listed. −1  i−3 i−2  We first compute the function gω = |DSω| . Set Di = 2 , 2 for i = 1, 2, 3, 4; −1 these are the intervals of monotonicity for the maps Tω. Then let Iij(ω) = Di∩Tω (Dj); these are the intervals of monotonicity for Sω. Note that depending on 1(ω) and 2(ω), some of the Iij(ω) may be empty (or single points, but single point intervals may as well be considered empty from the perspective of hanging points and integrals). When

ω is understood, we will just write Iij, and the index set is

 −1 N = Nω = ij : i, j ∈ {1, 2, 3, 4},Di ∩ Tω (Dj) is non-trivial .

99 The intervals Iij are ordered left-to-right along [−1, 1] as follows:

I11,I12,I13,I14,I24,I23,I22,I21,I34,I33,I32,I31,I41,I42,I43,I44.

In Figure 4.3, the only empty intervals are I31 and I41, since the map does not make a

“W” shape near x = 1/2. In Figure 4.2, all of I14, I24, I31, and I41 are empty. When both 1 and 2 are zero, all of the intervals Iij surrounding −1/2 and 1/2 (two on each side of each point) are empty; this is where exactly one of i, j is at most 2. Each of the Iij may be explicitly computed when they are non-trivial; the computation is just to compute preimages under the Di branch of Tω of the endpoints of Dj.

For notation, let η1 = η2 = 1 and η3 = η4 = 2. By the formula in Definition 4.0.1 o and the Chain Rule, we see that for x ∈ Iij,

1 gω(x) = . 4(1 + ηj(σ(ω)))(1 + ηi(ω))

Of course, gω = 0 at endpoints of the Iij.

Note that the map Sω depends on all four quantities 1(ω), 1(σ(ω)), 2(ω), and

2(σ(ω)), but only the two quantities 1(ω) and 2(ω) affect the complexity of the map. By affecting the complexity, we mean that 1(ω) and 2(ω) change the branch −1 structure of the map, since the intervals of monotonicity are Iij = Di ∩ Tω (Dj), and Tω only depends on 1(ω) and 2(ω). The other two quantities affect the expansion of the branches (how much of the space the branches cover), but they do not change the branch structure. Moreover, the quantities 1(ω) and 2(ω) only affect the behaviour of Sω on [−1, 0] and [0, 1], respectively. To simplify the exposition, we will investigate the bounds for different ranges of 1, obtain those for 2 by symmetry, and then obtain the complete bound by taking the maximum over the pairings. So, fix ω; we will consider the intervals in [−1, 0], which means we will look at the three cases of 1(ω): 1 = 0, 1 ∈ (0, 1/2], and 1 ∈ (1/2, 1]. Note that in every case, o g is constant on the (interiors of the) intervals of monotonicity Iij, so that each of the Var o (g) terms is zero. Iij First, assume that 1(ω) = 0; this is where in Figure 4.2, there would only be the two large tents on [−1, 0], and nothing special happening near x = −1/2. The hanging points in [−1, 0] are (−3/4, −), (−3/4, +), (−1/4, −), and (−1/4, +). Choose

J1 = [−1, −3/4],J2 = [−3/4, −1/2],J3 = [−1/2, −1/4],J4 = [−1/4, 0].

These happen to be I11, I12, I22, and I21, respectively; it is easy to see that each interval contains exactly one of the hanging points. Each of these intervals has measure

100 1 1 1 λ(Jm) = 4 · 2 = 8 , and we may compute hJ (m) for each m using the definition:

− 1 + 1 hJ (1) = g((−3/4) ) = , hJ (2) = g((−3/4) ) = , 4(1 + 1(σ(ω))) 4(1 + 1(σ(ω)))

− 1 + 1 hJ (3) = g((−1/4) ) = , hJ (4) = g((−1/4) ) = . 4(1 + 1(σ(ω))) 4(1 + 1(σ(ω))) matching the hanging point to the interval containing it. Each of those values is at 1 most 4 , and this is the best possible uniform bound. Next, assume that 1(ω) ∈ (0, 1/2], as in Figure 4.2. Here, there are two new hanging points at x = −1/2, because the map Tω leaks mass near −1/2 into [0, 1]

(which one can see in Figure 4.1). The hanging points in [−1, 0] are (t1, ±), (1/2, ±), and (t2, ±), where

(3/4 + 1(ω)) 1 t1(ω) = − , t2(ω) = − ; 1 + 1(ω) 4(1 + 1(ω)) the points t1, 1/2, and t2 are in order, left-to-right. We need to cover the six hanging points with intervals Jm, but we have to do this in such a way that the measures of the intervals cannot potentially vanish with shrinking 1. Noting that 1 ∈ (0, 1/2] and  5 3   1 1  that t1 and t2 are continuous in 1 tells us that t1 ∈ − 6 , − 4 and t2 ∈ − 4 , − 6 . Using this information, we can choose the following intervals:

J1 = [−1, t1],J2 = [t1, −5/8],J3 = [−5/8, −1/2],

J4 = [−1/2, −3/8],J5 = [−3/8, t2],J6 = [t2, 0].

1 Each of these intervals has measure at least 16 (since we are using normalized Lebesgue measure on [−1, 1]), and contains exactly one hanging point. Moreover, at each hanging point (x, s),

s 1 1 1 gω(x ) = ≤ ≤ , 4(1 + ηi(ω))(1 + ηj(σ(ω))) 4(1 + 1(ω)) 4 which is the best uniform bound in this case, because 1(ω) can approach 0. This 1 bound means that each hJ (m) is bounded above by 4 . Finally, assume that 1(ω) ∈ (1/2, 1], as in Figure 4.3. The complexity of the map has increased again, in the sense that there are more branches and there are now ten hanging points. Moreover, this case poses a new difficulty, over the previous cases: depending on 1, the hanging points in the middle branches can become arbitrarily close together. This means that we have two choices: either we change the intervals based on 1 and end up with arbitrarily small intervals (as Rychlik does for a single map in [48]), or we restrict the size of our intervals and let there be multiple hanging points in

101 some of the intervals. We cannot use the former option, because we need uniformity in the inequality, and choosing ever smaller intervals means the k·k1 coefficient explodes. Thus, we choose intervals in such a way as to attempt to minimize the contributions from the hJ (m), while keeping the intervals from being too small. We choose the intervals

J1 = [−1, j1],J2 = [j1, −1/2],J3 = [−1/2, j2],J4 = [j2, 0], where j1 and j2 are the two jumps that the map Sω takes in [−1, 0]. These jump points are the zeroes of Tω, because those points are where Tω switches from staying in [−1, 0] to leaking over to [0, 1]. They are given by

1 1(ω) 1 1(ω) j1(ω) = − − , j2(ω) = − + . 2 2(1 + 1(ω)) 2 2(1 + 1(ω))

As for t1 and t2 in the previous case, some simple analysis of j1 and j2 in terms of 1  3 2   1 1  indicates that j1 ∈ − 4 , − 3 and j2 ∈ − 3 , − 4 . The intervals J1 and J4 thus have 1 1 measures at least 8 , and the intervals J2 and J3 have measures at least 12 . Each of the intervals J1 and J4 contain two hanging points: for t1, t2 as in the previous case, J1 contains (t1, ±) and J4 contains (t2, ±). Using our knowledge of g and the values of 1, we obtain:

− + hJ (1) = g(t1 ) + g(t1 ) 1 1 = + 4(1 + η1(ω))(1 + η1(σ(ω))) 4(1 + η1(ω))(1 + η2(σ(ω))) 2 1 1 = ≤ 1 = , 4(1 + 1(ω))(1 + 1(σ(ω))) 2(1 + 2 )(1 + 0) 3

1 where to find the upper bound, we have that ηi(ω) = 1(ω) ≥ 2 . Similarly, 1 h (4) = g(t−) + g(t+) ≤ . J 2 2 3

For J2 and J3, we must handle the other six hanging points: (t3, ±), (−1/2, ±), and

(t4, ±), left-to-right, where

(1/4 + 1(ω)) 3 t3 = − , t4 = − . 1 + 1(ω) 4(1 + 1(ω))

These points are the vertices on the “W” shape; J2 contains (t3, ±) and (−1/2, −), and

102 J3 contains (−1/2, +) and (t4, ±). For J2, we have:

− + − hJ (2) = g(t3 ) + g(t3 ) + g((−1/2) ) 1 2 = + 4(1 + η1(ω))(1 + η3(σ(ω))) 4(1 + η1(ω))(1 + η4(σ(ω))) 1 3 1 = 3 · ≤ ≤ , 4(1 +  (ω))(1 +  (σ(ω))) 1  2 1 2 4 1 + 2 (1 + 0) where we ignore 2 and use the fact that 1 > 1/2. Again, basically the same compu- tation gives 1 h (3) = g((−1/2)−) + g(t−) + g(t+) ≤ . J 4 4 2

Observe that the contribution to hJ remains controlled because the larger expansion rate for this ω counteracts the increase in complexity (in terms of number of branches). This is a key feature of the approach. As mentioned earlier, all of these computations would be analogous when performed on the other interval, [0, 1]. So fix 1(ω) and 2(ω), and consider all of the bounds on the measures of the intervals and hJ together. We have:   n o 1 1 1 sup g + Var o (g) ≤ max + 0, + 0 = , Io In n n ∞ 6 4 4 1 1 1 1 max {hJ (m)} ≤ max , , = , m 4 3 2 2

 o  VarIn (g) sup o = 0, n∈N λ(In) h (m) 1/4 1/4 1/3 1/2  max J ≤ max , , , = 6, m λ(Jm) 1/8 1/16 1/8 1/12 allowing for every possible pairing of 1 and 2. This means that by Proposition 3.2.10, for any f ∈ BV (λ) we have:

1 1 3 Var(P (f)) ≤ + Var(f) + (0 + 6) kfk = Var(f) + 6 kfk . ω 4 2 1 4 1

This is uniform in ω. For the specific situation of 1, 2 ≤ 1/2 on Ω, doing the same for only cases 1 and 2 yields the inequality

1 Var(P (f)) ≤ Var(f) + 4 kfk , ω 2 1 as desired.

103 4.3 Covering Properties

As discussed at the beginning of this chapter in Remark 4.0.2, it is not enough to have

PωCa ⊆ Cνa for Pω to contract the cone. The following example illustrates what can go wrong.

Example 4.3.1. Let f :[−1, 1] → R be (a representative of the equivalence class [f]) given by  0, x ∈ [−1, 0),  f(x) = 8x, x ∈ [0, 1/2],  8(1 − x), x ∈ (1/2, 1], as in Figure 4.4(a). Choose, for concreteness, a = 48 and ν = 7/8. We see that

Var(f) = 8 ≤ 48 = 48 kfk1, so f ∈ C48, since f ≥ 0. Moreover, suppose that 1(ω) = 1(σ(ω)) = 2(ω) = 2(σ(ω)) = 0, and apply Pω to f. We get, using Lemma 3.2.4: X X 1 P (f) = F (g ◦ (S )−1)F (f ◦ (S )−1) = · F (f ◦ T −1) ω Kn ω n Kn ω n 2 Kn n n n 1  1 − (·) 1 + (·) 3 − (·) = f 1 + f 1 + f 1 4 4 [0,1] 4 [0,1] 4 [0,1] 3 + (·)  +f 1 . 4 [0,1] 1 − (·) 1 + (·) 1 + (·) 1 − (·) = + + + 1 = 21 . 2 2 2 2 [0,1] [0,1]

The result is depicted in Figure 4.4(b). We see that

Var(Pω(f)) = 2 ≤ 42 = 42 kPω(f)k1 , so that Pω(f) ∈ C42; since Var(1) = 0, 1 is also an element of C42. However, we see that

Pω(f) is supported on [0, 1], which means that θ(Pω(f), 1) = ∞, since there does not exist µ > 0 such that µPω(f) − 1 ≥ 0 (on [−1, 0], this function is always negative; or, really, any member of the equivalence class of functions is λ-almost-everywhere negative on [−1, 0]), hence β(Pω(f), 1) = ∞. In particular, Pω(f) has essential infimum equal to 0 (in this case, taken on a set of positive measure).

We follow Liverani [34] and Buzzi [8] in using covering to obtain contraction of the cone; we work λ-almost-everywhere as Buzzi does. Our definitions and lemmas will be stated specific to maps on the interval; note that we will be applying these lemmas to the second-iterate map Sω and operator Pω, even though we state them for a general

“Tω”. We still assume that (Ω, µ, σ) is an invertible, ergodic, probability-preserving

104 4 4

3.5 3.5

3 3

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

(a) The function f. (b) The function Pωf. Figure 4.4: The function f and its image under the Perron- Frobenius operator Pω. Observe that both functions are sup- ported only on [0, 1]. transformation. At the end of this section, we will work with an unrelated example before going back to our paired tent maps.

Definition 4.3.2. Let Tω :[−1, 1] → [−1, 1] generate a cocycle of maps over (Ω, µ, σ), each map satisfying the assumptions M, and suppose that for almost every ω, we have

λ(Tω([−1, 1])) = 1. Let Lω be the associated Perron-Frobenius operator. We say that the cocycle has the dynamical covering property when for µ-almost every ω and every interval I ⊆ [−1, 1] with positive measure, there exists an m = m(ω, I) such that for all 0 m0 m ≥ m, λ(Tω (I)) = 1. We say that the cocycle has the functional covering property when for µ-almost every ω and every interval I ⊆ [−1, 1] with positive measure, there exists an m = m(ω, I) such that for all m0 ≥ m,

(m0) 1 ess inf(Lω ( I )) > 0.

Note that the functional covering condition prevents the issue in Example 4.3.1, where the essential infimum of Pω(f) was equal to zero. This fact will be key to the contraction of the cone.

Remark 4.3.3. In the two definitions of covering, we require an almost-onto condi- tion: λ(Tω([−1, 1])) = 1 for almost every ω. Without this condition, neither covering property can be true. To see this, suppose that it does not hold, and for a positive 0 measure set Ω of ω, λ(Tω([−1, 1])) < 1. By ergodicity of (µ, σ), we know that the orbit of almost every ω enters Ω0 infinitely many times. Choose one of these ω and some interval I ⊆ [−1, 1] with positive measure. To see that the dynamical covering (m−1) 0 property does not hold, observe that for m ∈ Z≥1 such that σ (ω) ∈ Ω , we have

(m) (m−1) λ(Tω (I)) = λ(Tσm−1(ω)(Tω (I))) ≤ λ(Tσm−1(ω)([−1, 1])) < 1.

105 To see that the functional covering property does not hold, note that when ω ∈ Ω0, the complement of Tω([−1, 1]) in [−1, 1] is measurable (by the assumptions M) with (m−1) 0 positive measure. For m ∈ Z≥1 such that σ (ω) ∈ Ω and x ∈ [−1, 1] \ Tω([−1, 1]), we have (m) 1 X (m−1) 1 Lω ( I )(x) = gσm−1(ω)(y)Lω ( I )(y) = 0, y∈T −1 {x} σm−1(ω) since there are no pre-images of x under Tσm−1(ω). This holds on a set of positive (m) measure, so we see that ess inf(Lω (1I )) = 0, hence the functional covering property does not hold.

The next lemma indicates the relative strengths of the covering properties and provides a computational estimate.

Lemma 4.3.4. Let Tω :[−1, 1] → [−1, 1] generate a cocycle of maps over (Ω, µ, σ), where each Tω satisfies M and has λ(Tω([−1, 1])) = 1. If the cocycle has the functional covering property, then it also has the dynamical covering property. If the cocycle has the dynamical covering property and for µ-almost every ω, ess inf(gω) > 0, then it has the functional covering property. In either case, if mDC and mFC are the integers in the definitions of dynamical and functional covering, respectively, then mDC (ω, I) = mFC (ω, I); moreover, if in this case we have that for µ-almost every ω, ess inf(gω) > 0, 0 then for all m ≥ mFC (ω, I), we have

m0−1  0  (m ) 1 Y ess inf Lω ( I ) ≥ ess inf(gσj (ω)) > 0. j=0

(m0) Proof. By Lemma 3.2.3 and part (2) of Lemma 3.2.4, writing gω for the g function (m0) 0 of Lω , we know that for all m ≥ 1 and intervals I,

(m0) 1 X (m0) (m0) −1 1 Lω ( I ) = gω ◦ (Tω ) · (m0) , Tω (I∩In) n

(m0) where the sum is over the branches of Tω . By part (5) of Lemma 3.2.4, g is λ-almost- (m0) (m0) everywhere non-zero, and so Lω (1I ) is λ-almost-everywhere non-zero on ∪nTω (I ∩ (m0) (m0) In) = Tω (I), and is λ-almost-everywhere zero outside of Tω (I).

Suppose that the cocycle generated by Tω does not have the dynamical cover- ing property. Then there exist a positive measure set of ω such that for an inter- 0 (m0) val I with positive measure, there are infinitely many m such that λ(Tω (I)) < 1. (m0) But then Lω (1I ) is zero on a positive measure set by the above discussion, thus  (m0)  ess inf Lω (1I ) = 0, and so Tω does not have the functional covering property.

Now suppose that the cocycle generated by Tω has the dynamical covering property, and that for µ-almost every ω, ess inf(gω) > 0. Without loss of generality, we may

106 assume that ess inf(gω) > 0 for all ω, by removing a σ-invariant set of measure zero from Ω (using the invertibility of σ). Fix an interval with positive measure I, and an ω for which the functional covering property holds. Find m such that for all m0 ≥ m, (m0) Tω (I) has full measure. Let

m0−1 \ −1 Gm0 = gσi(ω)[ess inf(gσi(ω)), ∞); i=0

(m0) this set also has full measure, by definition. Hence, for almost every x ∈ Tω (I), there (m0)  (m0)  exists a y such that both Tω (y) = x and gσi(ω) Tω (y) ≥ ess inf(gσi(ω)) for each i = 0, . . . , m0 − 1. Then we have, for such x:

(m0) 1 X (m0) 1 Lω ( I )(x) = gω (y) I (y) 0 (m ) −1 y∈(Tω ) (x) m0−1 ! (m0) 0 1 0 Y ≥ gω (y ) I (y ) ≥ ess inf(gσi(ω)) · 1 > 0, i=0 where we use the fact that each gσi(ω) is bounded away from zero by hypothesis. This (m0) (m0) bound holds for almost every x ∈ Tω (I) uniformly, and hence ess inf(Lω (1I )) > 0, which is the functional covering property.

Looking closely at the proofs, we see that in each implication, mDC (ω, I) is equal to mFC (ω, I). When ess inf(gω) > 0 for almost every ω, then the lower bound shown in the proof is the lower bound stated in the lemma.

Remark 4.3.5. By Lemma 3.2.4, each Tω satisfying M sends sets of measure zero to sets of measure zero. Together with the almost onto condition, we have that if A is a set of measure 1, then T (A) is also a set of measure 1:

λ(T (A)) ≥ λ(T ([−1, 1])) − λ(T ([−1, 1] \ A)) = 1 − 0 = 1.

This simplifies the definitions of dynamical and functional covering, in the following (m) way. Given ω and I, suppose that there exists m ∈ Z≥1 such that Tω (I) has full measure in [−1, 1]. Then the next iterate of the cocycle preserves full measure, so that (m0) 0 Tω (I) has full measure for all m ≥ m (skipping the formal induction argument). (m) Similarly, if ess inf(Lω (1I )) > 0, then for almost every x, there is a preimage y of x such that gσ(m)(ω)(y) ≥ ess inf(gσ(m)(ω)), and we have

(m+1) 1 X (m) 1 Lω ( I )(x) = gσm(ω)(y)Lω ( I )(y) −1 y∈Tσm(ω){x} (m) 1 ≥ ess inf(gσm(ω)) · ess inf(Lω ( I )) > 0.

107 (m0) 0 Repeating this, we see that ess inf(Lω (1I )) > 0 for each m ≥ m. Because of this, to show that a cocycle has a covering property it suffices to show that it works for only one iterate. The definition is more general, to handle maps which are not as well-behaved as those satisfying the assumptions M.

The terminology for covering is natural: in finitely many steps, every non-trivial interval expands to cover the whole space. In [33, 34], Liverani uses a version of the dynamical covering property for just one map to obtain (implicitly) functional covering, which is then used to proceed with a cone argument. Here, we will use the adjective ‘dynamical’ to refer to covering at the map level, which is potentially directly checkable in good cases by using some combinatorics and careful consideration of the maps. The functional covering property is stated by Buzzi in [8] as ‘covering’; it is a primary assumption and applied directly in the same way that Liverani uses the covering property. Buzzi then comments about the conditional implication in Lemma 4.3.4. The adjective ‘functional’ here refers to the fact that it has to do with operators and functions. We will be following Liverani’s approach in applying the covering property, but in a cocycle setting.

Example 4.3.6. In general, the functional covering condition is harder to check. Not only because the quantities involved are more complicated, but also for the simple reason that functional covering is stronger than dynamical covering. That is, the reverse implication in Lemma 4.3.4 is not true without the assumption on the gω being bounded away from 0. To see why, let T : [0, 1] → [0, 1] be defined piecewise, by setting T (1) = 1 and

 2i − 11/3 1 i − 1 i  T (x) = T (x) = x − + , x ∈ , , i = 1, 2, 3, 4. i 8 2 4 4

This map is depicted in Figure 4.5. Consider the cocycle generated by powers of T , over the singleton set with the identity map. All of the assumptions in M are easy to check, including that the associated oper- ator L preserves Lebesgue measure on [0, 1]. The derivative is calculated to be

1  2i − 1−2/3 i − 1 i  T 0(x) = x − , x ∈ , , i 3 8 4 4

0 accepting Ti (x) = ∞ for x ∈ {1/8, 3/8, 5/8, 7/8}, and is at least 4/3 for all x where defined. Moreover, the derivative grows arbitrarily large near x ∈ {1/8, 3/8, 5/8, 7/8}, and so not only is the function g is 0 at those points, but we have ess inf(g) = 0. The map has onto branches, so together with the derivative being bounded below by 4/3 we conclude that the map T has the dynamical covering property. However, T does not have the functional covering property. Let n ∈ Z≥1 and I ⊆ [0, 1] with

108 1

0.5

0 0 0.25 0.5 0.75 1

Figure 4.5: The map T in Remark 4.3.6. positive measure. Let  > 0 and find δ > 0 such that for the set

4 [ 2i − 1 2i − 1  B = − δ, + δ , δ 8 8 i=1

−n the restriction g is bounded above by 4 . Note that g and 1I are bounded above Bδ by 1, and that every point in [0, 1) has four preimages under T , thus has exactly 4n−1 preimages under T n−1. This means that for any y ∈ [0, 1], we have

n−2 ! n−1 X Y j n−1 L (1I )(y) = g(T (z)) 1I (z) ≤ 4 . z∈(T n−1)−1{y} j=0

Thus, for any x ∈ T (Bδ), we have: X  Ln(1 )(x) = g(y)Ln−1(1 )(y) ≤ · 4 · 4n−1 = . I I 4n y∈T −1{x}

n This shows that we can make L (1I ) as close to 0 as we want on a set of positive n measure, and hence ess inf(L (1I )) = 0 for all n and I. Hence T does not have the functional covering property.

Alone, the functional covering property does not allow us to perform explicit com- putations with the Perron-Frobenius operator acting on the cone Ca, because it only

109 tells us about push-forwards of characteristic functions. The following lemma allows us to say something about functions in the cone Ca. It is similar to Lemma 3.2 in [34] and has overlap with a claim in [8, pg. 32].

Lemma 4.3.7. Suppose that Tω :[−1, 1] → [−1, 1] generates a cocycle of almost- onto maps satisfying M and suppose that it satisfies the functional covering property. K Let a > 0 and let Q = {Qk}k=1 be a partition of [−1, 1] into closed intervals with −1 λ(Qk) ≤ (2a) for each k. Let mC (ω, Q) = maxk{mDC (ω, Qk)}. Then for µ-almost every ω, for all h ∈ Ca, and for m ≥ mC (ω, Q), we have

1 (m) (m) 1 ess inf(Lω (h)) ≥ min{ess inf(Lω ( Qk ))} khk1 . 2 k

If moreover, ess inf(gω) > 0 for almost every ω, then

m−1 ! (m) 1 Y ess inf(L (h)) ≥ ess inf(g i ) khk . ω 2 σ (ω) 1 i=0

K Proof. Fix a > 0 and a partition Q = {Qk}k=1 of [−1, 1] into closed intervals, where −1 λ(Qk) ≤ (2a) for all k. Let h ∈ Ca, and suppose for the sake of contradiction that 1 ess infx∈Qk (h) < 2 khk1 for each k. Note that h ≥ 0, so |h| = h. By Lemma 3.1.2, we have that for all k and λ-almost-every x ∈ Qk,

h(x) ≤ ess inf(h) + VarQk (h), Qk which integrates to Z

h dλ ≤ ess inf(h)λ(Qk) + VarQk (h)λ(Qk). x∈Qk Qk

We therefore obtain the following inequality:

K K X Z X khk1 = h dλ ≤ ess inf(h)λ(Qk) + VarQk (h)λ(Qk) x∈Qk k=1 Qk k=1 K K 1 X 1 X < khk λ(Q ) + Var (h) 2 1 k 2a Qk k=1 k=1 1 1 1 a = khk + Var (h) ≤ khk + khk = khk , 2 1 2a [−1,1] 2 1 2a 1 1

which is false. This means that there exists some interval Qk on which ess infx∈Qk (h) ≥ 1 2 khk1 (h is at least half its 1-norm on the whole interval); for different h, it may be a 1 1 different interval Qk, but there is at least one. We may write this as h ≥ 2 khk1 Qk . Now, by positivity of the Perron-Frobenius operator Lω (that is, non-negative func-

110 tions are mapped to non-negative functions, as in Lemma 3.2.4), for each m ∈ Z≥1 we have 1  1 L(m)(h) ≥ L(m) khk 1 = khk L(m)(1 ). ω ω 2 1 Qk 2 1 ω Qk

Set m ≥ mC (ω, Q). By Lemma 4.3.4 and the functional covering property, since

m ≥ mC (ω, Q) ≥ mDC (ω, Qk) we have 1  1 (m) (m) 1 (m) 1 ess inf(Lω (h)) ≥ ess inf khk1 Lω ( Qk ) = min{ess inf(Lω ( Qk ))} khk1 , 2 2 k where we take a minimum over k to produce a uniform lower bound for all h. In the case where ess inf(gω) > 0 for almost every ω, by Lemma 4.3.4 we know that for each k, we have m−1 (m) 1 Y ess inf(Lω ( I )) ≥ ess inf(gσj (ω)), j=0 so inserting this lower bound into the previous inequality completes the proof.

Thanks to this lemma, we have an almost-everywhere positive lower bound for the images of elements of the cone Ca with unit integral after m ≥ mC (ω, Q) steps:

m−1 1 Y ess inf(g j ). 2 σ (ω) j=0

We call mC (ω, Q) the covering time (with respect to Q).

4.4 Illustration of the Covering Method

Before we attempt to apply Theorem 2.4.3 to our cocycles of paired tent maps, we will use it to compute the spectral gap for a much easier example. Moreover, we prove the following lemma, which describes an upper bound for the projective distance θa from an element f in the subcone Cνa ⊆ Ca to the constant 1, in terms of the essential infimum and supremum of f; it is essentially Lemma 3.1 in Liverani [34]. This inequality is why we need estimates on both of these quantities, as well as requiring the cone to be mapped into the subcone Cνa; using the triangle inequality, we can bound the diameter of the image of the cone uniformly on a positive measure set of ω.

Lemma 4.4.1. Let f ∈ Cνa ⊆ BV (λ), with a > 0 and ν ∈ (0, 1), such that kfk1 = 1.

111 Then we have: 1 + ν ess sup(f) θ (1, f) ≤ log · . a 1 − ν ess inf(f) Proof. By Definition 2.1.11, we have

β(1, f) θ (1, f) = log . a α(1, f)

We will find a lower bound for α, and an upper bound for β. For γ > 0, we have 1 1 1 f − γ ∈ Ca if and only if γ ≤ ess inf(f) and Var(f − γ ) ≤ a kf − γ k1 (λ is the measure, so we use a different name for the parameter in α). The first inequality gives, upon integration, that γ ≤ ess inf(f) ≤ kfk1 = 1. Then, note that we have:

1 Var(f − γ ) = Var(f) ≤ νa kfk1 = νa, 1 1 a kf − γ k1 ≥ a |kfk1 − γ k k1| = a |1 − γ| .

We see that if γ ∈ (0, 1) satisfies 1 − γ = |1 − γ| ≥ ν, then we have Var(f − γ1) ≤ 1 a kf − γ k1. Rearranging for γ, we get γ ≤ 1 − ν. Combined with the ess inf require- ment, we have that

α(f, 1) = sup {γ > 0 : f − γ1 ∈ Ca} = min {1 − ν, ess inf(f)} ≥ (1 − ν) · ess inf(f), since 1 − ν < 1 and ess inf(f) ≤ 1.

Similarly, for µ > 0 we have that µ1 − f ∈ Ca if and only if ess sup(f) ≤ µ 1 1 and Var(µ − f) ≤ a kµ − fk1. The same integration of the first inequality gives

µ ≥ ess sup(f) ≥ kfk1 = 1, and then we can do similar computations to those above:

1 Var(µ − f) = Var(f) ≤ νa kfk1 = νa, 1 a kµ − fk1 ≥ a |µ kfk1 − kfk1| = a |µ − 1| .

If µ > 1 satisfies µ − 1 = |µ − 1| ≥ ν, or µ ≥ 1 + ν, then we get µ1 − f ∈ Ca. Analogously to α, we obtain

β(f, 1) = inf {µ > 0 : µ1 − f ∈ Ca} = max {1 + ν, ess sup(f)} ≤ (1 + ν) · ess sup(f), because 1 + ν > 1 and ess sup(f) ≥ 1.

Finally, we put the bounds together, to get the final bound on θa:

β(1, f) 1 + ν ess sup(f) θ (1, f) = log ≤ log · . a α(1, f) 1 − ν ess inf(f)

112 Example 4.4.2. Let d ∈ Z≥2, let Ω = Z/dZ = {1, . . . , d} be the cyclic group of order d equipped with addition, let µ be the normalized uniform (Haar) measure on Ω (equipped with the discrete σ-algebra), and let σ(i) = i + 1 be the simple rotation-by- one map (we use d instead of 0 to more easily define our maps). We see that Ω is a Polish space, µ is a complete Borel probability measure, and σ is a homeomorphism, so that (Ω, µ, σ) is an invertible ergodic probability-preserving transformation (quite a rigid transformation, in fact). For each i ∈ Z/dZ, let Ti : [0, 1] → [0, 1] be the (n) map Ti(x) = ix (mod 1), and generate a cocycle of maps Ti and associated cocycle (n) of Perron-Frobenius operators Li . The resulting random dynamical system is µ- continuous. Each map Ti satisfies the conditions M and so we may apply our Lasota- 0 −1 −1 Yorke inequality (Proposition 3.2.10) to Li. For Ti, observe that gi = |Ti | = i except at the endpoints of the intervals of monotonicity, and moreover gi is constant in the interiors of those intervals. Each Ti has no hanging points, so we can just set the intervals J to be the intervals of monotonicity (though it does not matter). Thus, we obtain: 1 Var(L f) ≤ Var(f), i i for f ∈ BV (λ).

We need a nice cone; for simplicity, pick a = 1, and observe that every Li preserves the cone, since if f ∈ C1, we have 1 Var(L f) ≤ Var(f) ≤ Var(f) ≤ kfk . i i 1

We will show that when i 6= 1, Li actually contracts the cone. First, we have LiC1 ⊆ 1 C1/2, because we have Var(Lif) ≤ 2 Var(f) and in Remark 4.0.2 we have C1 = 0. Then, observe that Ti is uniformly expanding with onto branches (except for T1 = id, which has onto-branches at minimum), and so the cocycle of maps T has the dynamical covering property; each gi is bounded below, so the cocycle has the functional covering property by Lemma 4.3.4. In Lemma 4.3.7, choose Q1 = [0, 1/2] and Q2 = [1/2, 1], and observe that both intervals expand to cover the whole space in at most one step unless

T1 is used, in which case it is 2 steps (noting the rigidity of σ); thus mC (i, Q) = 1 if i 6= 1 and mC (1, Q) = 2.

So, set GP = {2, . . . , d} and kP = 1. If f ∈ C1/2, then we have, for all i ∈ GP :

3 ess sup(f) ≤ Var(f) + kfk ≤ kfk . 1 2 1 Then, by Lemma 4.3.7, for h ∈ C we have

1 ess inf(L (h)) ≥ ess inf(g ) khk , i 2 i 1

113 1 so uniformly in i ∈ GP we have ess inf(Li(h)) ≥ 2d khk1. We therefore use Lemma 4.4.1 to compute, for any f ∈ C1 and i ∈ GP :   1 + 1/2 ess sup(Lif) θ1(Lif, 1) ≤ log · 1 − 1/2 ess inf(Lif)  3  2 kLifk1 ≤ log 3 · 1 = log(9d). 2d kfk1

By the triangle inequality, we see that diamθ1 (LiC1) ≤ 2 log(9d); set DP = 2 log(9d). By Theorem 2.4.3, we have that there is an equivariant density in C1 corresponding to the largest Lyapunov exponent; in particular, each Li preserves Lebesgue measure, and so the constant 1 is the equivariant density. More interestingly, the second-largest

Lyapunov exponent λ2 satisfies: ! (d − 1)/2 log(9d)−1 λ ≤ − log tanh 2 1 2 d − 1 9d + 1 d − 1  2  − log = − log 1 + . d 9d − 1 d 9d − 1

At this point, we can also observe that the maps Ti commute with each other; we have

Tb(Ta(x)) = bax − b baxc − bbax − b baxcc = bax − bbaxc for all a, b ∈ Z≥1. Thus Td ◦ ...T1 = Td!, and so after d steps, for all i we are actually applying Td!. The operator Ld! shrinks variation by a factor of d!, and so for any f ∈ BV (λ), we have (by the proof of Lemma 3.1.2):

 1 n ess sup Ln f − ess inf(Ln f) ≤ Var(Ln f) ≤ Var(f). d! d! d! d!

Now, if f also has zero integral, then ess inf(f) ≤ 0 ≤ ess sup(f), and so

kfk1 ≤ ess sup(f) − ess inf(f).

The operator Ld! preserves the integral (and has BV -norm equal to 1), and so for f ∈ BV (λ) with zero integral, we have:

n n n kLd!fkBV = Var(Ld!f) + kLd!fk1  1 n ≤ Var(f) + (ess sup Ln f − ess inf(Ln f)) d! d! d!  1 n  1 n ≤ kfk + Var(Ln f) ≤ 2 kfk . d! BV d! d! BV

114 Hence for f ∈ BV (λ) with zero integral and any i ∈ Ω, we have

1 1 (n) (rn) (jn) lim sup log Li f = lim sup log Li Ld! f n→∞ n BV n→∞ djn + rn BV 1 dj 1 n (jn) ≤ lim sup log Ld! f d n→∞ djn + rn jn BV ! 1 1  1 j ≤ lim sup log 2 kfkBV d j→∞ j d! 1  1  1 = log = − log(d!). d d! d

1 Hence λ2 is actually at most − d log(d!), which is significantly more negative than our bound obtained from Theorem 2.4.3: by Stirling’s approximation, we have

1 1 − log(d!) = − (d log(d) − d + O(log(d))) = − log(d) + 1 + O(d−1 log(d)), d d which tends to negative infinity, whereas the bound from Theorem 2.4.3 is easily seen to tend to 0. Another way to approach the question of the spectral gap in Example 4.4.2 is to no- tice that the maps T1,...,Td are simultaneously Markov; we address this computation in the Appendix in Example A.3.2.

(n) 4.5 Contraction of the Cone - Spectral Gap for Lω

We are now ready to state and prove the precise version of Theorem B from the Introduction, Theorem 4.5.1. Theorem 4.5.1. Let (Ω, µ, σ) be an ergodic, invertible, probability-preserving transfor- mation, where Ω is homeomorphic to a Borel subset of a Polish space, µ is a complete

Borel probability measure on Ω, and σ is a homeomorphism, and let 1, 2 :Ω → [0, 1] be measurable functions with countable range which are both not µ-a.e. equal to 0. Let Z Z  1 2 M1,2 = min 1 dµ, 2 dµ ,D1,2 = 4(1 + max{ess sup(1), ess sup(2)}) . 2 Ω Ω

(n) 2 Let Tω = T1(ω),2(ω) generate a cocycle Tω of paired tent maps, and suppose that σ is ergodic on Ω1 ⊆ Ω with the restriction µ1 of µ to Ω1, with µ(Ω1) > 0. Then there exists an explicit set GP ⊆ Ω1 with positive measure and explicit numbers a > 0, ν ∈ (0, 1), and d ∈ Z≥1 such that upon setting

 log(a)   log(M ) k = + 1 + d + − 1,2 P log(1.5) log(4)

115 and      µ1(GP ) 1 2(1 + ν) 1 C(1, 2, a, ν, d) = log tanh log (1 + νa) + kP log(D1,2 ) 2kP 4 1 − ν 4 we have:

λ2 ≤ C(1, 2, a, ν, d) < 0 = λ1, where λ1 and λ2 are the largest and second-largest Lyapunov exponents for the cocycle (n) of Perron-Frobenius operators for Tω .

Note that this upper bound for λ2 is likely not optimal; it is simply the result of our particular method of proof. In particular, in the next section we use a smaller GP and a larger kP to obtain a better diameter bound DP . The result is an upper bound which happens to have a much nicer asymptotic property as the 1 and 2 parameters are scaled towards 0, but which requires a usage of Birkhoff’s ergodic theorem to obtain the asymptotic relationship, holding only for sufficiently small scaling parameters. Here, we have chosen to use the smallest kP , and the set G is computable simply from the maps 1 and 2. The proof will proceed via Lemmas 4.5.2 and 4.5.3 and Corollary 4.5.4, specific to these maps, to bound the mDC (ω, I) terms, which will be combined with Lemmas

4.4.1 and 4.3.7 to bound the diameter of the image of the cone Ca. The computation involved allows us to explicitly control the covering time mC (ω, Q) for a fixed choice of Q and on a positive measure set of ω, which is enough to apply our general theory to obtain explicit bounds on the second Lyapunov exponent; the covering time turns out to be exactly our contraction time. The next two lemmas and corollary are combinatorial in nature, meaning we are looking solely at the dynamics and counting a number of steps until the desired event occurs, keeping track of the measures involved. The goal is to show that the cocycle (n) Sω has the dynamical covering property, with very explicit estimates for the covering times.

The first lemma describes how quickly intervals expand under the action of Sω to cover one of [−1, 0] or [0, 1].

Lemma 4.5.2. Let Tω generate a cocycle of paired tent maps over (Ω, µ, σ), and let 2 Sω generate the cocycle of second iterate maps over (Ω1, µ , σ ). Ω1 1. If I ⊆ [−1, 1] is an interval with positive measure of the form [−1, b], [b, 0], [0, b], (m) or [b, 1], then Sω (I) contains [−1, 0], if I is [−1, b] or [b, 0], or [0, 1], if I is [0, b] or [b, 1], in at most − log(2λ(I)) m = log(4) steps.

116 (m) 2. If I ⊆ [−1, 1] is an interval with positive measure, then Sω (I) contains one of [−1, 0] or [0, 1] in at most

− log(2λ(I)) m = + 1 log(1.5)

steps.

Proof. First, let I ⊆ [−1, 1] be an interval with positive measure of the form [−1, b]. If

I contains I11, then we have  Sω(I) ⊃ Sω(I11) = Tσ(ω) [−1, 0] ⊃ [−1, 0].

If I is contained in I11, then because Sω restricted to I11 is an affine map with expansion at least 4, we have that λ(Sω(I)) ≥ 4λ(I). If Sω(I) covers I11, then we apply the next iterate and are finished, but if not we continue to iterate, to obtain

(n) n λ(Sω (I)) ≥ 4 λ(I).

(n) n We are looking for Sω (I) to cover [−1, 0], which has measure 1/2, so we solve 4 λ(I) ≥ 1/2 to obtain − log(2λ(I)) n ≥ . log(4) Potentially, it could take fewer steps than that because the expansion rate could be larger. For the other cases, replace I11 with I21, I34, or I44, respectively, and argue similarly (the symmetries for the maps Sω allow the same argument to work almost verbatim). For the second part, we begin by restricting I to lie inside of one of the intervals of continuity for Sω, which in the index notation are:

I11 ∪ I12,I13 ∪ I14 ∪ I24 ∪ I23,I22 ∪ I21,

I34 ∪ I33,I32 ∪ I31 ∪ I41 ∪ I42,I43 ∪ I44.

Refer to Figure 4.3. When we push I forward by Sω, I expands according to the slope of Sω, but there might be overlap in the image due to the number of monotonicity branches. Where there are two branches (for example, in I11 ∪ I12), we know that the slope is at least 4, and there is at worst the possibility that the interval image exactly overlaps on each branch, so we observe that

4 λ(S (I)) ≥ λ(I) = 2λ(I). ω 2

On the other hand, where there are four branches (for example, in I13 ∪ I14 ∪ I24 ∪ I23

117 where the middle branches are non-trivial), the scale factor is at least 6 (by looking at the formula for the derivative of Sω), and at worst there are four overlapping sections of the image. This means that

6 λ(S (I)) ≥ λ(I) = 1.5λ(I). ω 4

Thus in all restricted situations, λ(Sω(I)) ≥ 1.5λ(I).

Continuing to work with the restricted intervals I, write Sω(I) as   Sω(I) = Sω(I) ∩ [−1, 0] ∪ Sω(I) ∩ [0, 1] .

We have two cases: in one case, as we apply the cocycle to I the resulting set does not split into two pieces of positive measure (one in [−1, 0], the other in [0, 1]). The scale factor is at least 1.5 at each step, and so we solve (1.5)nλ(I) ≥ 1/2 to find a bound on the number of steps it takes to cover one of [−1, 0] or [1, 0]:

− log(2λ(I)) n ≤ . log(1.5)

In the other case, at some point the image of I splits into two intervals, one contained in [−1, 0] and the other in [0, 1]. By the first part of the lemma, the resulting intervals scale in length by a factor of 4. If we only look at the larger of the two intervals, the 1.5 (k) (k) size of the interval is at least 2 λ(Sω (I)) = 0.75λ(Sω (I)), and in the next step the length scales by 4, and 4 · 0.75 > (1.5)2. We therefore see that the number of steps it (m) takes the image of I under Sω to cover one of [−1, 0] or [0, 1] is no more than in the case where the interval does not split, except if there is a split in the interval instead of covering all of [−1, 0] or [0, 1]. In this case, the next iteration produces the covering, and so we simply add 1 to our previous bound. Finally, suppose that I is an interval that overlaps at least two adjacent intervals of continuity, listed above. If it contains at least two adjacent intervals of continuity (meaning it overlaps at least four adjacent intervals of continuity), one of them must have an image that covers [−1, 0] or [0, 1] in one step. The easiest way to see this fact is to look at Figures 4.2 and 4.3 and observe that if I covers any two adjacent intervals of continuity, the intervals cannot both avoid covering one of [−1, 0] or [0, 1]. So, suppose that I overlaps at most three adjacent intervals of continuity. Then I intersects one of the intervals with more than one-third of its measure; call the intersection I0. If I0 has more than two branches of monotonicity of Sω, then I contains a full branch of monotonicity over one of I13, I23, I32, or I42, and the images of these intervals are [0, 1] or [−1, 0], respectively, because at least one of I14, I24, I31, or I41 is non-trivial. Finally,

118 0 suppose that I has at most two branches of monotonicity of Sω. Then we have 1 1 2 λ(S (I0)) ≥ · 4 · λ(I) = λ(I). ω 2 3 3 This image interval is of the form [−1, b] or [b0, 1], so in the next step its measure scales 2 2 0 with factor 4, and 4· 3 > (1.5) , so in two steps I expands faster than expanding under the 1.5 scale factor. The argument from the case where I was contained in an interval of continuity now concludes the proof.

A key aspect of the paired tent maps, when 1 and 2 take non-zero values, is that mass can “leak” from [−1, 0] to [0, 1] and vice versa. In fact, this leakage is the principal reason to study these maps, because it directly leads to mixing properties of the maps. The next lemma describes how long it can take for leakage to occur in both directions, for certain ω.

Lemma 4.5.3. Let Tω generate a cocycle of paired tent maps over (Ω, µ, σ) with 1, 2 :

Ω → [0, 1] having positive integrals, and let Sω generate the cocycle of second iterate 2 maps over (Ω1, µ1, σ ). Let

1 Z Z  M1,2 = min 1 dµ, 2 dµ . 2 Ω Ω

Then there exists an explicit set G ⊆ Ω1 with positive measure and an explicit positive integer d such that for every ω ∈ G,

(d)  (d)  2λ Sω ([−1, 0]) ∩ [0, 1] ≥ M1,2 and 2λ Sω ([0, 1]) ∩ [−1, 0] ≥ M1,2 .

(d) (d) In particular, Sω ([−1, 0]) ∩ [0, 1] and Sω ([0, 1]) ∩ [−1, 0] each contain intervals of the 1 form [0, a], [a, 1], [−1, a], and [a, 0] with total measure at least 2 M1,2 .

2 Proof. Recall that we define Ω2 to be all of Ω if σ is ergodic on Ω and Ω \ Ω1 if not.

We define G1 and G2 by

−1  −1 −1  Gk = Ω1 ∩ k [M1,2 , 1] ∪ σ Ω2 ∩ k [M1,2 , 1] ,

2 for k = 1, 2. By ergodicity of σ on Ω1, there exists a smallest d12 ∈ Z≥1 such −2d12 that G1 ∩ σ (G2) has positive measure, and there exists a smallest d21 such that −2d21 G2 ∩ σ (G1) has positive measure. Set d = max{d12, d21}, and define

d−1 ! d−1 ! [ −2i [ −2i G = G1 ∩ σ (G2) ∪ G2 ∩ σ (G1) . i=0 i=0

−2d12 −2d21 This set has positive measure, as it contains both G1 ∩ σ (G2) and G2 ∩ σ (G1).

119 To show that G has the required property, observe from the form of Sω that:

Sω([−1, 0]) ∩ [0, 1] = [0, 1(σ(ω))] ∪ [1 − 2(1 + 2(σ(ω)))1(ω), 1],

Sω([0, 1]) ∩ [−1, 0] = [−1, −1 + 2(1 + 1(σ(ω)))2(ω)] ∪ [−2(σ(ω)), 0].

Sd−1 −2i Let ω ∈ G. Then we have two cases. Suppose first that ω ∈ G1 ∩ i=0 σ (G2). By def- inition of G1, we have max{1(ω), 1(σ(ω))} ≥ M1,2 , and since 2(1 + 2(σ(ω)))1(ω) ≥ 1(ω), we see that

2λ(Sω([−1, 0])) ≥ M1,2 .

d12 Then, we know that σ (ω) ∈ G2, so that

(2d12) 2d12 2d12+1 2λ(Sω ([0, 1]) ∩ [−1, 0]) ≥ max{2(σ (ω)), 2(σ (ω))} ≥ M1,2 .

(d12) All intervals making up Sω([−1, 0])∩[0, 1] and Sω ([0, 1])∩[−1, 0] continue to expand under future iterates of the cocycle, and so we have

(d) (d) min{2λ(Sω ([−1, 0]) ∩ [0, 1]), 2λ(Sω ([0, 1]) ∩ [−1, 0])} ≥ M1,2 .

Sd−1 −2i The analogous argument holds for ω ∈ G2 ∩ i=0 σ (G1). This completes the proof.

The previous two lemmas allow us to show that intervals with positive measure cover the whole space by first covering one of [−1, 0] or [0, 1] and then leaking to the other side and expanding. The following corollary gives a precise statement with a quantitative bound on the covering time.

Corollary 4.5.4. Let Tω generate a cocycle of paired tent maps over (Ω, µ, σ) with

1, 2 :Ω → [0, 1] having positive integrals and let Sω generate the cocycle of second 2 iterate maps over (Ω1, µ1, σ ). Let M1,2 , G, and d be as in Lemma 4.5.3. For all τ ∈ (0, 1), let − log(2τ) m (τ) = + 1, 1 log(1.5)

 2(m1(τ)+m) let m2(ω, τ) = min m ≥ 0 : σ (ω) ∈ G , and let

− log(M ) m = 1,2 . 3 log(4)

Then m2(ω, τ) is finite for almost every ω, and if I ⊆ [−1, 1] is an interval with λ(I) ≥ τ, we have

mDC (ω, I) ≤ m1(τ) + m2(ω, τ) + d + m3.

120 (n) Thus Sω has the dynamical covering property.

Proof. Fix τ ∈ (0, 1), and let I ⊆ [−1, 1] be an interval with λ(I) ≥ τ. By part (b) of (m) Lemma 4.5.2, we see that Sω (I) contains one of [−1, 0] or [0, 1] in at most

− log(2λ(I)) − log(2τ) + 1 ≤ + 1 = m (τ) log(1.5) log(1.5) 1

(m1(τ)) steps, which can be rephrased as saying Sω (I) covers one of [−1, 0] or [0, 1] for all

ω. By Lemma 4.5.3, there exists an explicit set G ⊆ Ω1 with positive measure and a positive integer d such that for every ω ∈ G,

(d) (d) 2λ(Sω ([−1, 0]) ∩ [0, 1]) ≥ M1,2 and 2λ(Sω ([0, 1]) ∩ [−1, 0]) ≥ M1,2 .

By ergodicity of σ2 on Ω , we have that S∞ σ−2m(G) is a set of full measure, and 1 m=m1(τ)  2(m1(τ)+m) thus m2(ω, τ) = min m ≥ 0 : σ (ω) ∈ G is finite for almost every ω. Since (m1(τ)) Sω (I) contains one of [−1, 0] or [0, 1] for all ω and Sω is onto for all ω, we have (m1(τ)+m2(ω,τ)) that Sω (I) also contains one of [−1, 0] or [0, 1], for almost every ω. Then in d more iterates, this set is guaranteed to leak into the other half of the space [−1, 1], 1 with minimum measure 2 M1,2 . This leakage takes the form of one or two intervals with one endpoint being −1, 0, or 1, as appropriate. By part (a) of Lemma 4.5.2, we see that the leaked mass expands to cover the remainder of the space [−1, 1] in at most & ' − log 2 · 1 M  − log(M ) m = 2 1,2 = 1,2 3 log(4) log(4) steps. Putting all of this together, we get that for almost every ω ∈ Ω1,

(m1(τ)+m2(ω,τ)+d+m3) Sω (I) = [−1, 1].

(n) Hence mDC (ω, I) ≤ m1(τ) + m2(ω, τ) + d + m3 < ∞, and so Sω has the dynamical covering property.

We are now equipped to prove Theorem 4.5.1; all of the previous results were to set up an application of Theorem 2.4.3 to the cocycle of Perron-Frobenius operators (n) for Sω .

l 6 m Proof of Theorem 4.5.1. Choose ν ∈ (3/4, 1) and a = ν−3/4 , so that Sω(Ca) ⊆ Cνa 2a for all ω. Since a is an integer, let Q = {Qk}k=1 be a uniform partition of [−1, 1] into −1 −1 closed intervals, so that λ(Qk) = (2a) for all k. In Corollary 4.5.4, set τ = (2a) , −2m1(τ) and let GP = σ (G), where G is the set from Lemma 4.5.3. For ω ∈ GP , we have

121 m2(ω, τ) = 0. Thus for each k, we have

−1 mDC (ω, Qk) ≤ m1((2a) ) + m2(ω, τ) + d + m3  log(a)   log(M ) = + 1 + d + − 1,2 =: k . log(1.5) log(4) P

In the proof of Proposition 4.2.1, we computed gω, the weight function for the

Perron-Frobenius operator Pω associated to Sω. In particular, we have 1 ess inf(g ) ≥ = D−1 > 0 ω 2 1,2 4(1 + max{ess sup(1), ess sup(2)})

for all ω, recalling the definition of D1,2 from the statement of Theorem 4.5.1. There- (n) fore, by Lemma 4.3.4 and Corollary 4.5.4 the cocycle Sω has the functional covering property. Let mC (ω, Q) = maxk{mDC (ω, Qk)} as in Lemma 4.3.7, and observe that kP ≥ mC (ω, Q) for every ω ∈ GP . Applying Lemma 4.3.7 gives us that for every h ∈ Ca with khk1 = 1 and ω ∈ GP , we have

1 kP −1 1 (kP ) Y −kP ess inf(P (h)) ≥ ess inf(g i ) ≥ D . ω 2 σ (ω) 2 1,2 i=0

By Lemma 3.1.2, we have that

(kP ) (kP ) (kP ) ess sup(Pω (h)) ≤ ess inf(Pω (h)) + Var(Pω (h)).

(kP ) In general, ess inf(·) ≤ k·k1, and we know Pω (h) ∈ Cνa, so that

(kP ) (kP ) (kP ) ess sup(Pω (h)) ≤ Pω (h) 1 + a Pω (h) 1 = (1 + a) khk1 = 1 + a, since Pω preserves k·k1. Inserting these bounds into Lemma 4.4.1 gives us ! 1 + ν ess sup(P (kP )(h)) (kP ) ω θa(1,P (h)) ≤ log · ω (kP ) 1 − ν ess inf(Pω (h)) 2(1 + ν)  ≤ log (1 + νa) + k log(D ). 1 − ν P 1,2

Using the triangle inequality and the scale-invariance of the projective metric θa, we

122 obtain that for all ω ∈ GP ,

kP  kP diamθa (Pω (Ca)) = sup θa(f1, f2): f1, f2 ∈ Pω (Ca)  1 (kP ) ≤ 2 sup θa( ,Pω (h)) : h ∈ Ca, khk1 = 1 2(1 + ν)  ≤ 2 log (1 + νa) + 2k log(D ) =: D . 1 − ν P 1,2 P

Observe that our assumptions on (Ω, µ, σ) and 1, 2 imply that the random dy- namical system is µ-continuous, because 1 and 2 have countable range and therefore are constant on measurable subsets of Ω. We now apply the cocycle Perron-Frobenius (n) theorem, Theorem 2.4.3, to Sω with parameters GP , kP , and DP . The result is that 0 0 (n) if λ1 and λ2 are the first and second largest Lyapunov exponents for the cocycle Sω , we have      0 µ1(GP ) 1 2(1 + ν) 1 0 0 λ2 ≤ log tanh − log (1 + νa) + kP log(D1,2 ) + λ1 < λ1. kP 4 1 − ν 4

0 We know that λ1 = 0, because khk1 ≤ khkBV ≤ (1 + a) khk1 for all h ∈ Ca and Pω preserves the integral of non-negative elements of BV (λ); every element of Ca has 0 0 Lyapunov exponent λ1 for Pω, so we get 0 ≤ λ1 ≤ 0. Finally, to convert this to a (n) statement about the Lyapunov exponents for Tω , λ1 = 0, and λ2, we apply Lemma 4.1.2 in the case of k = 2, to obtain:      µ1(GP ) 1 2(1 + ν) 1 λ2 ≤ log tanh − log (1 + νa) + kP log(D1,2 ) < 0 = λ1. 2kP 4 1 − ν 4

All of the terms that make up the bound here are explicitly computable for specific examples. This concludes the proof.

Example 4.5.5. Let Ω = [0, 1] equipped with Lebesgue measure µ and the irrational rotation σ(ω) = ω + α; assume α ∈ (0, 1/3). Define 2 to by

( 2  0 ω ∈ {0} ∪ 3 , 1 , 2(ω) = 1 1 1  2n+1 ω ∈ 3·2n , 3·2n−1 , n ∈ Z≥0, and set 1(ω) = 2(1−ω). Then 1 and 2 are both measurable with countable range and (Ω, µ, σ) is an invertible, ergodic, probability-preserving homeomorphism on a Polish space equipped with a complete Borel measure, so we satisfy the conditions in Theorem 2.4.3. We can compute

∞ 1 Z Z  1 X 1 1 1 1 1 M = min  dµ,  dµ = · = · = . 1,2 2 1 2 2 2n+1 3 · 2n 12 1 − 1/4 18 Ω Ω n=0

123 l 6 m Set ν = 7/8 and a = ν−3/4 = 48. The partition Q is a collection of 96 uniformly distributed intervals, with measures 96−1, so set τ = 96−1, and compute

− log(1/48) m (96−1) = + 1 = 11. 1 log(1.5)

We can choose G = [1/3, 2/3] ⊆ Ω and d = 1, since in one step each Sω will leak over −22 at least 1/2 ≥ 1/36 mass, so we get GP = σ [1/3, 2/3]. We compute kP as above:

 log(1/18) k = 11 + 1 + − = 14. P log(4)

2 Finally, D1,2 is equal to 4(1 + 1/2) = 9, and so we obtain DP as above:

2(1 + 7/8)  D = 2 log (1 + 42) + 2 · 14 log(9) = 2 log(1290) + 28 log(9) ≈ 75.847. P 1 − 7/8

Thus we get our upper bound for λ2 to be:

1/3  D  1 λ ≤ − log tanh( P )−1 ≈ log (tanh(75.847)) . 2 2 · 14 4 84

This quantity is close to zero, but it is still negative. We will see in the next section that by keeping track of the mass transfer more effectively, we can obtain more interesting information, namely the asymptotic order for an upper bound.

Example 4.5.6. Suppose that 1 = 1 = 2; then every Tω = T1,1 is the same and we are mimicking the deterministic case. By the same calculation as in Example 4.4.2, we see that each Tω preserves Lebesgue measure, and the second-largest Lyapunov exponent

λ2 for the associated cocycle of Perron-Frobenius operators satisfies λ2 ≤ − log(4) = 1 − 2 log(16), where the latter quantity is the second-largest Lyapunov exponent for the 2 operators associated to Sω = T1,1. We can also see what our general result says. As in Example 4.5.5, choose ν = 7/8, −1 1 a = 48, and τ = 96 , so that m1(τ) = 11. We see that M1,2 = 2 and D1,2 = 16, and GP can be chosen to be Ω. We get kP = 13, and computing DP gives us

2(1 + 7/8)  D = 2 log (1 + 42) + 2 · 13 log(16), P 1 − 7/8 so we obtain 1  D  1 λ ≤ log tanh P ≈ log (tanh(21.603)) , 2 13 4 13 which is also close to zero while still being negative. To improve the estimate, instead

124 note that Pω contracts variation by a factor of 8 and contributes no 1-norm term in the Lasota-Yorke inequality, so we can choose ν = 1/8, a = 1, and τ = 1/2, hence m1(τ) = 1 and kP = 3. Then we obtain

2(9/8)  81 D = 2 log (1 + 1/8) + 2 · 3 log(16) = 2 log + 6 log(16), P 7/8 28 and so 1  D  1 λ ≤ log tanh P ≈ log (tanh(4.690)) ≈ −2.813 · 10−5. 2 2 · 3 4 6

The moral of the story is that in the general case we are throwing away a large amount of the contraction while we are waiting for the map to, presumably, only contract the cone once.

Remark 4.5.7. Note that in Theorem 4.5.1 we assumed that 1, 2 both had countable range. In this way, the map ω → Lω becomes µ-continuous. In the event that 1, 2 do not have countable range (almost-everywhere), there is no way that ω → Lω can be µ-continuous into B(BV (λ)) in most nice circumstances, because the Perron-Frobenius operators are discrete in the BV operator norm topology, and so we can force arbitrary sets to be measurable. To see this fact (for the operators associated to paired tent maps), let δ1, 1, δ2, 2 ∈ [0, 1] with 1 > δ1, and compute:

1 1 1 Lδ1,δ2 [−1,−1/2] = [−1,δ1], 2(1 + δ1) 1 1 1 L1,2 [−1,−1/2] = [−1,1]. 2(1 + 1)

The variation of the difference of these (equivalence classes of functions) is then (see Figure 4.6):

1  1 1 1 1 1 Var (Lδ1,δ2 − L1,2 ) [−1,−1/2] = + ≥ + = . 2(1 + δ1) 2(1 + 1) 4 4 2 1 The same can be done for 2 > δ2 with [1/2,1], and hence the norm of any Lδ1,δ2 −L1,2 is at least 2/5 when at least one of the parameters differ (since 1[−1,−1/2] and 1[1/2,1] have BV -norm equal to 5/4). The union of balls of radius 1/5 around an arbitrary collection of operators Lω is then open, so arbitrary sets in Ω would be measurable if

Lω were measurable, which is generally untrue. The same argument can be made for the second iterate maps Sω. To tackle the uncountable range question, we would need to choose a different (preferably separable) Banach space on which our operators can act and which allows

ω → Lω to be (strongly) measurable, and find a nice cone that is contracted and

125 1 − 1 2(1+δ1) 2(1+1)

δ1 1

− 1 2(1+1) 1 Figure 4.6: A picture of (Lδ1,δ2 −L1,2 ) [−1,−1/2] for 0 < δ1 < 1. The first jump is of size 1 , and the second jump is of size 2(1+δ1) 1 , so the variation is the sum of the two jump sizes. 2(1+1) sometimes preserved. Potentially there are some sort of Sobolev spaces or anisotropic Banach spaces that would work; see, for example, [4] for why that guess could be reasonable, and see [19] for an example of applying the MET to operator cocycles on such a space.

4.6 Perturbation Asymptotics

If we set 1 = 2 = 0, then the cocycle we obtain is simply the deterministic case of tak- ing powers of the map T0,0. This map has a two-dimensional eigenspace corresponding to the eigenvalue 1, spanned by the invariant densities 1[−1,0] and 1[0,1]. If we perturb

1 and 2 by constants, then T1,2 only has a single invariant probability density, rather than two, and there is a spectral gap between the eigenvalue 1 and the next largest eigenvalues. If 1 and 2 are both non-zero constants but we consider what happens as they tend to zero, the spectral gap (as a function of 1 and 2) shrinks towards zero (see [27], Remark 4). We can say something similar in the random setting, as well, with our cocycle of paired tent maps. If we fix 1 and 2 with countable range and then scale them down to zero with a parameter κ ∈ (0, 1], then we can look at the Lyapunov exponents (instead of eigenvalues), and we can investigate what the spectral gap can be. By Theorem 4.5.1 (an application of Theorem 2.4.3), we know that each of these cocycles does have a spectral gap with a one-dimensional Oseledets space corresponding to λ1 = 0; it is a natural next question to ask how big the spectral gap is. In general, this is a very challenging question, but in our case, we can compute an asymptotic order for a lower bound on the spectral gap (which is an upper bound on the second-largest Lyapunov exponent).

126 f(x) For notation, we say that f(x) ∼ g(x) when lim = 1, and we say that a(x) . x→0+ g(x) b(x) when there exist c(x) and d(x) such that a(x) ∼ c(x), b(x) ∼ d(x), and c(x) ≤ d(x). We prove the following theorem.

Theorem 4.6.1. Let (Ω, µ, σ), 1, and 2 be as in Theorem 4.5.1. Let κ ∈ (0, 1] and define Tω := Tκ1(ω),κ2(ω). Then there exists c > 0 such that the second-largest Lyapunov exponent λ2 of the cocycle of Perron-Frobenius operators satisfies λ2(κ) . −cκ.

This theorem says that there is an upper bound for λ2(κ) that is asymptotically linear in κ. The statement of Theorem C in the Introduction is obtained from Theorem 4.6.1 by unravelling the limits involved in the asymptotic calculations and making c closer to 0 if necessary. It turns out that the bound C(κ) from Theorem 4.5.1 is asymptotically propor- tional to κ/ log(κ), using standard asymptotic equivalences; see Lemma A.2.1. The improvement in Theorem 4.6.1 comes from allowing the wait time kP to increase as κ decreases and meanwhile being much more careful about accounting for the mass that leaks between the two intervals. As mentioned at the beginning of this chapter, we choose to present both upper bounds for the second Lyapunov exponent to illustrate multiple outcomes of the same technique that serve somewhat different purposes. We begin with simple computational equalities for the M and D terms. We see that κ acts strictly as a scale parameter, without affecting the underlying structure of the cocycle.

Lemma 4.6.2. We have, for κ ∈ (0, 1]:

Mκ1,κ2 = κM1,2 , 2 Dκ1,κ2 = 4(1 + κB1,2 ) ,

where B1,2 = max{ess sup(1), ess sup(2)}. Proof. These are straightforward computations. First, we have:

1 Z Z  κ Z Z  Mκ1,κ2 = min κ1 dµ, κ2 dµ = min 1 dµ, 2 dµ = κM1,2 . 2 Ω Ω 2 Ω Ω

Then we have:

2 Dκ1,κ2 = 4(1 + max{ess sup(κ1), ess sup(κ2)}) 2 2 = 4(1 + κ max{ess sup(1), ess sup(2)}) = 4(1 + κB1,2 ) .

In the proof of Theorem 4.5.1, we took the smallest kP possible, which ended up being a constant plus a term logarithmic in M1,2 . We then chose our set GP to be a set where minimum leakage from one half of [−1, 1] to the other happened in a

127 bounded number of steps, no matter the direction. Finally, we computed a bound for the diameter using a lower bound for the pushforward of characteristic functions of finitely many intervals and facts about covering, and applied Theorem 2.4.3. For the lower bound of the pushforwards of characteristic functions, we used the dynamical covering property of the maps and the lower bound on the gω functions, which was determined using the fact that there is at least one kP -th preimage of any point in the original interval, and taking the product of the scale factors along the orbit. In general, one expects multiple leakage times and mixing within the halves as the system runs. Choosing kP to be larger and accounting for the times at which leakage occurs means that we can account for more leakage terms, which will give a better (that is, smaller) diameter bound DP for the image of the cone, because we have more options for preimages without decreasing their contributions to the value (n) of Pω (1I ). Choosing a larger kP , however, requires a better choice of the set GP , as well; we will choose GP using Birkhoff’s pointwise ergodic theorem to find a set where the frequencies of leaking (in both directions) is high. The last ingredient in the proof is a firm understanding of how to accurately account for preimages and leaking in the lower bound. Set I− = [−1, 0] and I+ = [0, 1], and for each ω define two operators:

− 1 + 1 Rω = I− · Pω,Rω = I+ · Pω.

s Each Rω acts on integrable functions and, by taking equivalence classes, yields an 1 s operator on L (λ). If f is a density on [−1, 1] and s ∈ {−, +}, then Rω(f) is the s s restriction of the density Pω(f) to the interval I ; Rω captures information about mass staying in the same interval or leaking over. The next lemma collects the required ± properties of Rω .

Lemma 4.6.3. Let ω ∈ Ω.

− + − + 1. The operators Rω and Rω are linear and satisfy Pω = Rω + Rω .

1 s 2. For non-negative f ∈ L (λ) and s ∈ {−, +}, Rω(f) is also non-negative.

n 3. For n ≥ 1, let Γ ⊆ {−, +} be a collection of strings of the form (b0b1 . . . bn−1), (b) and for such a string b = (b0b1 . . . bn−1) let Rω be defined by

(b) bn−1 b0 Rω = Rσ2(n−1)(ω) ◦ · · · ◦ Rω .

1 (n) P (b) Then for any non-negative f ∈ L (λ), Pω (f) ≥ b∈Γ Rω (f).

128 n 4. For n ≥ 1, b = (b0b1 . . . bn−1) ∈ {−, +} , and x ∈ [−1, 1], define

n −1 −1 b (n) \ (i) bi−1 Φω(x) = Sω {x} ∩ Sω (I ). i=1

This is the set of points z that in n steps of the cocycle S are mapped to x and that at each intermediate step i are found in Ibi . Then for any integrable function f : X → R and almost every x ∈ [−1, 1], we have

n−1 ! (b) X Y (i) Rω (f)(x) = gσ2i(ω)(Sω (z)) f(z), b z∈Φω(x) i=0

and in particular for any non-trivial interval I ⊆ [−1, 1], we have:

n−1 (b) 1 b Y Rω ( I )(x) ≥ I ∩ Φω(x) · ess inf(gσ2i(ω)), i=0

b b where I ∩ Φω(x) is the (finite) cardinality of I ∩ Φω(x) and gω is the scale b function for Pω. Finally, if I ∩ Φω(x) ≥ c ≥ 0 for λ-almost every x, then

n−1 (b) 1 Y (2) ess inf(Rω ( I )) ≥ c ess inf(gσ2i(ω)). i=0

s Proof. The first two parts follow directly from the definition of the Rω. For the third part, observe that for n = 1 this is clear, by the first two parts. Suppose that it holds for some n ≥ 1. Let Γ ⊆ {−, +}n+1 and let

Γ0 = {b0 ∈ {−, +}n :(b−) or (b+) ∈ Γ} .

Then we have, for f ∈ L1(λ):

(n+1) − (n) + (n) Pω (f) = Rσ2n(ω)(Pω (f)) + Rσ2n(ω)(Pω (f)) ! ! − X (b) + X (b) ≥ Rσ2n(ω) Rω (f) + Rσ2n(ω) Rω (f) b0∈Γ0 b0∈Γ0 X (b) X (b) = Rω (f) ≥ Rω (f). b∈(Γ0−)∪(Γ0+) b∈Γ

By induction, the inequality holds for all n ≥ 1.

129 Lastly, let f : X → R be integrable, let x ∈ [−1, 1], and consider s ∈ {−, +}. Then

s 1 X X Rω(f)(x) = Is (x) gω(z)f(z) = gω(z)f(z), −1 s z∈Sω {x} z∈Φω(x)

s −1 −1 s s where the equality holds because Φω(x) = Sω {x} ∩ Sω (I ) and x ∈ I if and only if −1 s 0 n z ∈ Sω (I ). Suppose that for all b ∈ {−, +} , we have

n−1 ! (b) X Y (i) Rω (f)(x) = gσ2i(ω)(Sω (z)) f(z), b z∈Φω(x) i=0

0 n+1 and consider b = (b bn) ∈ {−, +} . We then have:

0 (b) bn (b ) Rω (f)(x) = Rσ2n(ω)(Rω (f))(x) n−1 ! 1 X X Y (i) = Ibn (x) gσ2n(ω)(y) gσ2i(ω)(Sω (z)) f(z) y∈S−1 {x} z∈Φb0 (y) i=0 σ2n(ω) ω n ! X Y (i) = gσ2i(ω)(Sω (z)) f(z), b z∈Φω(x) i=0

bn−1 bn b where we see that the intermediate y had to be in I and x ∈ I , so z ∈ Φω(x). This b 1 proves the form of Rω(f). Specifying f = I , we have that the sum is over preimages b z ∈ Φω(x) ∩ I, so we easily obtain:

n−1 n−1 (b) 1 X Y (i) b Y Rω ( I )(x) = gσ2i(ω)(Sω (z)) ≥ I ∩ Φω(x) · ess inf(gσ2i(ω)), b z∈I∩Φω(x) i=0 i=0 where the absolute value term denotes cardinality (in this case, the number of n-th preimages of x that are in I and are in Ibi at the i + 1-st time). The essential infimum calculation follows easily by this last part, after taking equivalence classes and using b the λ-almost-everywhere lower bound for I ∩ Φω(x) .

l 6 m Proof of Theorem 4.6.1. Choose ν ∈ (3/4, 1) and a = ν−3/4 , so that SωCa ⊆ Cνa for 2 all ω. As in the proof of Lemma 4.5.3, let Ω2 be Ω if σ is ergodic on Ω and Ω \ Ω1 otherwise. Define subsets G1 and G2 of Ω1 by

−1  −1 −1  Gk = Ω1 ∩ k [M1,2 , 1] ∪ σ Ω2 ∩ k [M1,2 , 1] , for k = 1, 2. These two sets are the sets where leakage happens from [−1, 0] to [0, 1] and vice versa, respectively, with some minimum amount. Note that by Lemma 4.6.2,

130 we have

κi(ω) ≥ Mκ1,κ2 = κM1,2 ⇐⇒ i(ω) ≥ M1,2 .

Because of this, G1 and G2 are independent of κ ∈ (0, 1], even though the amount of leakage scales with κ. By Birkhoff’s pointwise ergodic theorem, we know that

n−1 1 X 2i 1G ◦ σ −→ µ1(Gk) n k n→∞ i=0

µ-almost everywhere for k = 1, 2. We can find N1,N2 ∈ Z≥1 and sets C1,C2 ⊆ Ω1 such that µ1(C1 ∩ C2) ≥ 1/2 and when n ≥ Nk and ω ∈ Ck, then for f = 1 2i n−1 2 min{µ1(G1), µ1(G2)}, the orbit (σ (ω))i=0 is in Gk at least fn times. (See Lemma 0 A.2.4.) Set N0 = max{N1,N2} and G = C1 ∩ C2 and observe that for all n ≥ N0 and 0 2i n−1 ω ∈ G , the orbit (σ (ω))i=0 is in Gk at least fn times for each k = 1, 2.  log(M ) Let κ ∈ (0, 1) be a solution of N = − κ1,κ2 . For κ ≤ κ , set 0 0 log(4) 0

 log(M ) k = m + 2 − κ1,κ2 = m + 2m (κ), P 1 log(4) 1 3

−1 −1 where m1 = m1((2a) ) is the waiting time until any interval of measure (2a) covers l m − + log(Mκ1,κ2 ) one of I or I and m3(κ) = − log(4) is the number of steps it takes for a + − leaked interval of size Mκ1,κ2 to expand to cover one of I or I , both as in Corollary 0 −2m1 0 2a 4.5.4. Let GP = σ (G ) and let Q = {Qk}k=1 be the same partition of [−1, 1] into −1 uniform-size closed intervals, each with measure λ(Qk) = (2a) , as in the proof of Theorem 4.5.1. − + (m1) Observe that each Qk is contained in one of I and I ; we claim that Sω (Qk) −1 covers the same half of [−1, 1]. To see this, note that the computation for m1((2a) ) (n) involved following an interval along its orbit under Sω , and if the interval ever inter- 1 −1 sected both sets, we only followed the larger piece. For small κ, we have κ  2 (2a) , which means that the amount of leakage at any step of the orbit is small compared to the measure of the larger piece (which, in the absolute worst case, is at least −1 1 −1 (i) (2a) − κ ≥ 2 · (2a)  κ). Hence the bulk of the set Sω (Qk) lies in the same s (m1) s I in which it starts, and so indeed Sω (Qk) covers that I .

For fixed κ ∈ (0, κ0], we will be using the lower bound

1 1 ess inf(gω) ≥ = 2 . Dκ1,κ2 4(1 + κB1,2 )

(kP (κ)) 1 We will now find a lower bound for Pω ( Qk ); the primary tool is Lemma 4.6.3.

131 s Claim. Suppose that Qk ⊆ I , for s ∈ {−, +}.

• If x ∈ Is, then

(kP (κ)) 1 −2kP (κ) −m1 Pω ( Qk )(x) ≥ (1 + κB1,2 ) · 4 .

• If x ∈ I−s, then

1 1 f · m (κ) (kP (κ)) 3 P (1Q )(x) ≥ · · . ω k 2kP (κ) m1 m3(κ) (1 + κB1,2 ) 2 · 4 4

s 0 Proof of claim. Fix Qk ⊆ I , for some s ∈ {−, +}, and fix ω ∈ GP ; we will use the fact that for such ω, σ2m1 (ω) ∈ G0. By the above comment, we see that for any x ∈ Is, s there is at least one m1-th preimage z ∈ Qk for x that stays in I along its orbit; that m1 b is, if b = (s . . . s) ∈ {−, +} , then Φω(x) is non-empty. By part (4) of Lemma 4.6.3, we have

m1−1  m1 (b) b Y 1 R (1Q )(x) ≥ I ∩ Φ (x) · ess inf(gσ2i(ω)) ≥ , ω k ω 4(1 + κB )2 i=0 1,2 where gω is the scale function for Pω. We then obtain

 1 m1 (kP (κ)) (2m3(κ)) P (1 ) ≥ P (1 s ). ω Qk 2 σ2m1 (ω) I 4(1 + κB1,2 )

We now consider the two cases. First, consider the case where x ∈ Is. Then x has exactly four preimages in Is, and each of those preimages has four preimages in Is, and so on. If we let b ∈ {−, +}2m3(κ) be the string whose components are all s, then b 2m3(κ) we see that Φω(x) = 4 . By parts (3) and (4) of Lemma 4.6.3, we have:

 1 m1 (kP (κ)) (2m3(κ)) P (1 )(x) ≥ P (1 s )(x) ω Qk 2 σ2m1 (ω) I 4(1 + κB1,2 )  m1 1 (b) ≥ R (1 s )(x) 2 σ2m1 (ω) I 4(1 + κB1,2 ) 42m3(κ) −2kP (κ) −m1 ≥ = (1 + κB , ) · 4 . 2 m1+2m3(κ) 1 2 (4(1 + κB1,2 ) )

Second, consider the case where x ∈ I−s; this is where we use σ2m1 (ω) ∈ G0. We will split up the remaining iterates of the cocycle into two stages: in the first stage, we s will wait m3(κ) steps and count the number of times the map leaks I into the other half of the space (which we will denote I−s), and in the second stage, we will let the map mix for another m3(κ) steps.

132 2m1 0 Since σ (ω) ∈ G and m3(κ) ≥ N0 (by choice of κ), we see that in the next m3(κ) steps, the cocycle leaks [−1, 0] to [0, 1] and [0, 1] to [−1, 0] at least f ·m3(κ) times each. − − + + Let 0 ≤ n1 < ··· < nl(−) ≤ m3(κ) − 1 and 0 ≤ n1 < ··· < nl(+) ≤ m3(κ) − 1 be the times when leakage happens from [−1, 0] to [0, 1] and from [0, 1] to [−1, 0], respectively, (m3(κ)) along the cocycle Sσ2m1 (ω); we know that l(−), l(+) ≥ f · m3(κ). i 2m3(κ) s For 1 ≤ i ≤ l(s), let b ∈ {−, +} be the string whose first ni − 1 components are s, and the rest of the components are −s. This string follows points that stay in Is −s i l(s) for a while and then switch to I and remain there. Let Γ = {b }i=1 be the collection of these strings. By parts (3) and (4) of Lemma 4.6.3, we have:

l(s) i (2m3(κ)) X (b ) 1 s 1 s Pσ2m1 (ω) ( I )(x) ≥ Rσ2m1 (ω)( I )(x) i=1 l(s)  2m3(κ) 1 X s bi ≥ · I ∩ Φ 2m (x) . 4(1 + κB )2 σ 1 (ω) 1,2 i=1

We need to compute Is ∩ Φbi (x) . Here, we must identify how the map acts over σ2m1 (ω) s s the 2m3(κ) steps. Over the first ni − 1 steps, the interval I undergoes mixing, and at s s ni −1 s s s the end any point in I will have 4 preimages in I that stay in I for all n1 − 1 s −s −s s s steps. At the ni -th step, we follow the leakage to I ; for y ∈ I ∩ Sσ2(m1+n1)(ω)(I ), s there are at least 2 preimages of y in I (the map is symmetric). Over the next m3(κ) steps, the leakage expands to cover all of I−s, and this could potentially happen in an invertible way. Over the remaining steps, I−s mixes with itself in the same way that Is mixes. Putting all of this together tells us that:

i s s s b 2m3(κ)−m3(κ)−ni ni −1 I ∩ Φ 2m (x) ≥ 4 · 1 · 2 · 4 σ 1 (ω) = 4m3(κ) · 2−1.

−s s Putting all of this information together, we obtain, for x ∈ I and Qk ⊆ I :

 1 m1 (kP (κ)) (2m3(κ)) P (1 )(x) ≥ P (1 s )(x) ω Qk 2 σ2m1 (ω) I 4(1 + κB1,2 ) l(s) 1 1 X ≥ · 4m3(κ) · 2−1 (1 + κB )2kP (κ) 4m1+2m3(κ) 1,2 i=1 1 1 l(s) = · · 2kP (κ) m1 m3(κ) (1 + κB1,2 ) 2 · 4 4 1 1 f · m (κ) ≥ · · 3 . 2kP (κ) m1 m3(κ) (1 + κB1,2 ) 2 · 4 4

This last term is clearly smaller than 1, so this lower bound is the lower bound for

133 0 all x. Hence, for ω ∈ GP we have (again using part (4) of Lemma 4.6.3):

1 1 f · m (κ) kP (κ) 3 ess inf(P (1Q )) ≥ · · =: γ(κ). ω k 2kP (κ) m1 m3(κ) (1 + κB1,2 ) 2 · 4 4

Note that clearly, γ(κ) −→ 0. κ→0+ Finally, we obtain our diameter bound, apply Theorem 2.4.3, and perform the asymptotic calculations. As at the end of the proof of Theorem 4.5.1, we now have that 2(1 + ν)  diam (P kP (κ)C ) ≤ 2 log (1 + νa) − 2 log(γ(κ)), θa ω a 1 − ν 0 using Lemma 4.4.1. Call this diameter bound DP = c1 − 2 log(γ(κ)), writing c1 for the constant involving a and ν. Write

−c1/2 0 e µ1(GP )fM1,2 c2 = . 2 · 4m1

For κ ∈ (0, κ0], we apply the cocycle Perron-Frobenius theorem, Theorem 2.4.3, to the (n) 0 0 cocycle Pω with parameters GP , kP ,DP , to get an upper bound C1(κ) for λ2(κ). We have, using standard asymptotic estimates and the definition of kP ,

0    µ1(GP ) 1 C1(κ) = log tanh (c1 − 2 log(γ(κ))) kP (κ) 4 −2e−c1/2µ (G0 ) ∼ 1 P · γ(κ). m1 + 2m3(κ)

Moreover, we have  log(M ) log(κ) m (κ) = − κ1,κ2 ∼ − . 3 log(4) log(4) This allows us to show that

1  = exp − 2(m1 + 2m3(κ)) log(1 + κB1,2 ) ∼ 1, 2kP (κ) (1 + κB1,2 )

f · m (κ) −m3(κ) 3 that 4 ∼ κM , , and that γ(κ) ∼ · κM , . Thus we obtain: 1 2 2 · 4m1 1 2

−c1/2 0 −2e µ1(G ) C1(κ) ∼ · γ(κ) m1 + 2m3(κ) −c1/2 0 −2e µ1(G ) f · m3(κ) ∼ · m · κM1,2 m1 + 2m3(κ) 2 · 4 1

∼ −c2κ.

Thus we have λ2(κ) ≤ C1(κ) ∼ −c2κ, which is the desired result in Theorem 4.6.1.

134 One may ask about sharpness of the upper bound in Theorem 4.6.1. We do have sharpness in some sense; we will prove this fact in the next chapter.

135 Chapter 5 Markov Paired Tent Maps

In Section 4.6 we proved Theorem 4.6.1, which indicated asymptotics for the second- (n) largest Lyapunov exponent of the cocycle of operators Lω arising from a perturbation of T0,0 in paired tent maps. However, it was an upper bound, not an exact calculation; one may ask if there is a better upper bound in general, or if there is a choice of

1, 2 such that when these parameters are scaled down to 0, we actually obtain linear asymptotics for the second-largest Lyapunov exponent in the scale parameter. The goal of this chapter is to prove the latter, stated as the following theorem:

Theorem 5.0.1. Let (Ω, µ, σ), 1, and 2 be as in Theorem 4.5.1, and set 1 = 2 = 1 ∞ for all ω. There is a sequence (κn)n=1 ⊆ (0, 1/2) such that κn → 0, each Tω = Tκn,κn is Markov, and λ2(κn) ∼ −2κn.

In the theorem statement, λ2(κn) is the second-largest Lyapunov exponent for the cocycle. Because 1 and 2 are constant here, we are really just dealing with the deterministic setting, with a single map. Figure 5.1 illustrates the symmetry of Tκ,κ, which greatly simplifies the behaviour of the map and allows us to perform extremely interesting calculations. From here on, set Tκ := Tκ,κ. To prove Theorem 5.0.1, we do not use Theorem 2.4.3, because it cannot give us the exact asymptotics required; instead, we use the fact that we can find a sequence of Markov maps Tκn , and exploit the fact that the interesting spectral information of the associated Perron-Frobenius operator Lκn is determined by the associated ad- jacency matrix Aκn corresponding to its Markov partition. We apply methods from determinant-free linear algebra, ring theory, and complex analysis to compute the rel- evant spectral information of Aκn and determine the exact asymptotics of the second- largest Lyapunov exponent.

5.1 Markov Maps and Partitions

We first give the definition of Markov maps and their associated partitions, in the setting of maps on a compact subinterval of R. For our purposes, these maps are most

136 1

0.5

0

-0.5

-1 -1 -0.5 0 0.5 1

Figure 5.1: The paired tent map Tκ,κ, with parameter κ = 0.3. helpful when the map is also piecewise-linear; conveniently, each paired tent map is piecewise linear.

Definition 5.1.1. A map T :[a, b] → [a, b] is Markov when there is a finite collection r {Ri}i=1 of disjoint open intervals in [a, b] such that: S 1.[ a, b] \ i Ri is the collection of endpoints of the intervals {Ri}, and

2. if Ri intersects T (Rj), then all of Ri is contained in T (Rj).

The collection {Ri} is called a Markov partition for T , even though it is not a partition, strictly speaking.

The next lemma is a combination of Theorem 9.2.1 in [7] (the matrix restriction), Lemma 3.1 in [6] (the relationship between the spectrum of the full operator and the spectrum of the matrix), and Theorem 1 in [26] applied to the case of piecewise-linear Markov maps with finitely many branches.

Lemma 5.1.2. Suppose that the map T :[a, b] → [a, b] is piecewise-linear (with finitely r many branches) and Markov, with Markov partition {Ri}i=1; note that T has constant 1 slope mi on each Ri. If V = spanC { Ri : 1 ≤ i ≤ r} and L is the Perron-Frobenius operator for T , then V is L-invariant (considered as a subspace of BV (λ)). Define r 1 an isomorphism φr : V → C by φ( Ri ) = ei. Then the restriction of L to V can be

137 represented by the r-by-r matrix M = [mij], where

( 1 Ri ⊆ T (Rj), |mj | mij = 0, otherwise, with φr ◦ L = M ◦ φr. Moreover, the essential spectral radius for L acting on BV (λ) is −1 1/n ρ = lim |(T n)0| , and we have n→∞ ∞  σ(L) \ B(0, ρ) = σ L V \ B(0, ρ) = σ(M) \ B(0, ρ).

Finally, if the absolute values of the slopes of the branches of T are all equal to m, then −1 M = m A, where A = [aij] is the adjacency matrix for T , given by ( 1 Ri ⊆ T (Rj), aij = 0, otherwise, and the essential spectral radius for L acting on BV (λ) is ρ = m−1, so that

σ(L) \ B(0, m−1) = m−1σ(A) \ B(0, m−1).

The upshot of Lemma 5.1.2 is that for piecewise-linear Markov maps, we can com- pute spectral data from a matrix instead of dealing with the full operator, and if the slopes are constant over all branches (up to sign), then that matrix is very combinato- rial in nature and amenable to analysis. Because our maps are indeed piecewise-linear with uniform slope up to sign, we see that when Tκ is Markov, to find the largest eigenvalues for Lκ it suffices to look only at the spectrum of the matrix Aκ, for which we have all of our linear algebra tools. In particular, we can look at the spectrum of Aκ to find the second-largest eigenvalues. So, we ask: when are these maps Markov? A general sufficient condition for piecewise linear maps is given by the following lemma, which says that it is enough for the endpoints of monotonicity intervals to be invariant in finitely many steps. We may then apply the lemma to Tκ by investigating the images of ±1/2.

Lemma 5.1.3. Let T :[a, b] → [a, b] be an onto piecewise linear map and let E0 be the set of endpoints of the intervals of monotonicity for T . For each i ≥ 1, let Ei = ± {T (s ): s ∈ Ei−1}. Suppose that there exists m such that Em = Em+1. Then T is R M Markov, with Markov partition {Ri}i=1, where {ri}i=0 enumerates Em in an increasing way and Ri = (ri−1, ri).

Proof. Let m be the smallest m such that Em = Em+1; let ri and Ri be defined as in the statement of the lemma. Since the union of the intervals {Ri} and their endpoints

138 is the same as the union of the intervals of monotonicity along with those endpoints, S [−1, 1] \ i Ri is the endpoints of the Ri. Then, since Em = Em+1, for each j we have T (Rj) = (rk, rl) for some k < l depending on j. Thus, if Ri ∩ T (Rj) 6= ∅, we must have k ≤ i < l, since the intervals Ri are disjoint; hence Ri ⊆ T (Rj). Hence T is Markov.

We now apply Lemma 5.1.2 in the specific case of our paired tent maps Tκ.

∞ Lemma 5.1.4. There exists a decreasing sequence (κn)n=1 ⊆ (0, 1/2) such that Tκn is n Markov and κn −→ 0. Each κn satisfies (2 + 2κ) κ = 1. The Markov partition for n→∞

Tκn is, for n = 1,

n 1  1  1  1  o −1, − 2 , − 2 , −κ1 , (−κ1, 0) , (0, κ1) , κ1, 2 , 2 , 1 , and for n ≥ 2,

n−2 n o n i i+1  o (−1,Tκn (−κn)) ∪ Tκ (−κn),Tκ (−κn) n n i=1 n o ∪ T n−1(−κ ), − 1  , − 1 , −κ  , (−κ , 0) , (0, κ ) , κ , 1  , 1 ,T n−1(κ ) κn n 2 2 n n n n 2 2 κn n n−2 n i+1 i  o n o ∪ Tκ (κn),Tκ (κ) ∪ (Tκn (κn), 1) . n n i=1

Remark 5.1.5. The Markov partitions for T1 and T4, are shown in Figures 5.2(a) and

5.2(b). The case n = 1 is distinct because the branch of the map used for x = κn is i different than the branches used for the further iterates Tκn (κn). In each picture, one may visually confirm that the collection of intervals actually is a Markov partition by checking that the image of each (horizontal) interval stretches vertically over a union of consecutive intervals (that is, whenever the graph crosses a vertical line, it always crosses a horizontal line).

Proof. Consider the paired tent map Tκ for κ ∈ (0, 1/2). We will use Lemma 5.1.3 to

find conditions on κ that make Tκ Markov. Start with E0 = {−1, −1/2, 0, 1/2, 1}. The map Tκ is continuous everywhere except at 0, for which the one-sided limits are ±1, and we have Tκ(−1) = −1 and Tκ(1) = 1. Then Tκ(−1/2) = κ and Tκ(1/2) = −κ, so we consider iterates of κ under Tκ; by symmetry, iterates of −κ will work similarly. In i i particular, we will find, for each n ≥ 1, a κn such that Tκn (κn) = 1−(2+2κn) κn > 1/2 for 1 ≤ i < n and n n Tκn (κn) = 1 − (2 + 2κ) κ = 0. First, consider the equation (2 + 2κ)nκ − 1 = 0. Rearrange and take logarithms to obtain − log(κ) n = h(κ) := . log(2 + 2κ)

139 1 1

R R R R R R R R R 7 8 9 10 11 12 4 5 6 0 0 R R R 1 2 3 R R R R R R 1 2 3 4 5 6

-1 -1 -1 0 1 -1 0 1

(a) n = 1 (b) n = 4

Figure 5.2: Markov partitions for Tκn , with n = 1, 4.

The function h(κ) is decreasing on (0, ∞), is unbounded as κ tends to 0, and has h(1/2) = log(2)/ log(3) < 1. Thus we conclude that for each n ∈ Z≥1, there exists a n 1 unique κn ∈ (0, 1/2) solving (2 + 2κ) κ − 1 = 0, and κn decreases to 0. i Fix n ≥ 2; we will show that Tκn (κn) > 1/2 for each i = 1, . . . , n − 1. For each of those i, we have:

i 1 1 1 (2 + 2κn) κn = n−i ≤ < . (2 + 2κn) 2 + 2κ 2

Observe that for i = 1, we have

1 T (κ ) = 1 − (2 + 2κ )κ > . κn n n n 2

i By repeated application of Tκn and use of the upper bound on (2 + 2κn) κn, we see that for all 1 ≤ n − 1, 1 T i (κ ) = 1 − (2 + 2κ )iκ > . κn n n n 2 n i We have proven that Tκn (κn) = 0, and Tκn (κn) > 1/2 for 1 ≤ i < n; the symmetric statement holds for −κn.

Finally, fix n ≥ 1. We claim that En is where the sequence of Ei terminates. To see this, observe that

i n i n−1 En = E0 ∪ {Tκn (±1/2)}i=1 = {−1, −1/2, 0, 1/2, 1} ∪ {Tκn (±κn)}i=0 ,

n and note that we just saw that Tκn (±κn) = 0, so that En+1 = En. This shows that

Tκn is Markov, using Lemma 5.1.3. The listed Markov partitions are given by tracing

1 n Another way to see this claim is by observing that (n, κ) 7→ Tκ (1/2) is increasing, in both n and κ (where this property makes sense).

140  1 0 0 0 0 0 1 0 0 0 0 0 0 0   1 0 0 0 0 1 0     0 1 0 0 0 1 0     0 0 1 0 0 1 0     . .. . .   . . . .     0 0 0 1 0 1 0     0 0 0 1 0 1 0 0 0 0 0 0 0 0     0 0 0 1 0 1 0 0 1 1 0 0 0 0     0 0 0 0 1 1 0 0 1 0 1 0 0 0     0 0 0 0 0 0 0 0 1 0 1 0 0 0     0 1 0 1 0 0 0     . . . .   . . .. .     0 1 0 0 1 0 0     0 1 0 0 0 1 0     0 1 0 0 0 0 1  0 0 0 0 0 0 0 1 0 0 0 0 0 1

Figure 5.3: General form of the (2n + 4)-by-(2n + 4) adjacency matrix An.

i Tκn (±κn) as i runs from 0 to n − 1.

We now see that Tn := Tκn is Markov for each n ≥ 1. From the graph of the maps and the form of the Markov partition, it is easy to read off the adjacency matrix

An := Aκn ; since the partition has 2n + 4 pieces, the matrix is (2n + 4)-by-(2n + 4). For n ≥ 4 the general form of the matrix is as in Figure 5.3. For n ≤ 3 some of the columns are combined.

The spectrum of the representation of the Perron-Frobenius operator, Mn := Mκn , −1 is just the spectrum of An scaled by (2(1 + κn)) , so we may focus our analysis on

An. For each n, let 1 Vn = spanC { Ri : 1 ≤ i ≤ 2n + 4} , where {Ri} is the Markov partition for Tn.

5.2 Spectral Properties of An

Observe that the Markov partition for Tn is symmetric about 0. Moreover, observe that for any κ, Tκ is odd: Tκ(−x) = −Tκ(x). In particular, if ψ(x) = −x, then Tn◦ψ = ψ◦Tn.

The map ψ has a Perron-Frobenius operator, Lψ, and the commutation relation says that LnLψ = LψLn. We also see that the map ψ is Markov on the same partition as

Tn and, noting the symmetry of this partition, we obtain ψ(Ri) = R2n+5−i. Thus the action of Lψ on Vn is represented by the matrix Jn, as shown in Figure 5.4, and we

141 0 0 0 0 0 1 0 0 0 ... 0 1 0   0 0 0 1 0 0    . .. .   . . .    0 0 1 0 0 0   0 1 0 ... 0 0 0 1 0 0 0 0 0

Figure 5.4: The (2n + 4)-by-(2n + 4) matrix Jn.

2 have MnJn = JnMn, so also AnJn = JnAn. It is also clear that because ψ = id, we 2 −1 have Jn = I. Because Jn = Jn, we have JnAnJn = An. Left-multiplication by Jn reverses the order of the rows and right-multiplication by Jn reverses the order of the columns, so the combination of both of them is performing a half-circle rotation of the matrix; we thus have independent verification of the half-circle rotational symmetry of

An, which could be seen from Figure 5.3. Recall (see Theorem 1.3.19 in [25]) that if two diagonalizable matrices commute, then they are simultaneously diagonalizable, meaning that there is a shared basis of eigenvectors for the matrices. Also note that if two r-by-r matrices A and B commute, then C2n+4 becomes a left-C[x, y]-module, by setting p(x, y)v := p(A, B)v for polyno- mials p(x, y) ∈ C[x, y] and v ∈ C2n+4. We will now use these facts to find many spectral properties of An, using its relation with Jn; note that we will find the spectral data of the entire sequence of An all at once! We use Axler’s approach to determinant-free linear algebra [3] and the practical implementation of those ideas by McWorter and

Meyers [36]. Moreover, we make significant use of the underlying map Tn to read off the algebraic relationships satisfied by An and Jn without doing a single matrix com- putation. We therefore reduce much of the study of the Perron-Frobenius operators to matrices that are easily studied by looking directly at the underlying maps.

n+1 n Lemma 5.2.1. We have An(An − 2An − 2Jn) = 0.

Proof. Zooming in on the interval [−1/2, 0], as in Figure 5.5, we can identify the intervals Rn−1 through Rn+2. The interval Rn−1 is the interval immediately to the left of the left zero of Tn in [−1, 0]; the interval Rn is the left branch of the leaking from

[−1, 0] to [0, 1]; the interval Rn+1 is the large interval (−1/2, −κn); and the interval

Rn+2 is the interval (−κn, 0).

Looking at the map Tn and using the Markov partition, we see that for i ≤ n − 1, the interval Rn+2 is mapped to R1 and is subsequently expanded to the interval (−1, ri) in i total steps, which is represented by

i Anen+2 = e1 + ··· + ei.

142 0

R R R R R n-1 n n+1 n+2 n+3

-1

-0.5 0

Figure 5.5: A zoomed-in look at the Markov partition for Tn in [−1, 0].

n−1 In the case of i = n − 1, we have Tn (Rn+2) = (−1, rn−1), and because rn−1 is the left n zero for Tn in [−1, 0], we have Tn (Rn+2) = (−1, 0), which is represented by

n An(en+2) = e1 + ··· + en+2.

Then, we clearly have Tn(−1, 0) = (−1, κn), so that because Tn is (except at −1/2) 2-to-1 on [−1, 0], we have

n+1 n An en+2 = 2(e1 + . . . en+3) = 2Anen+2 + 2Jnen+2,

n+1 n where we used Jnei = e2n+5−i. We rearrange this to (An − 2An − 2Jn)en+2 = 0. Because An and Jn commute, we see that for any polynomial p ∈ C[x, y], we have

n+1 n (An − 2An − 2Jn)p(An,Jn)en+2 = 0.

i Now, by the equations for Anen+2, the fact that Jnen+2 = en+3, and the fact that An and Jn commute, we see that

spanC {p(An,Jn)en+2 : p ∈ C[x, y]}

= spanC {e1, . . . , en−1, (en + en+1), en+2,

en+3, (en+4 + en+5), en+6, . . . , e2n+4} .

We can see that acting on the vector en+2 by An and Jn does not separate en and en+1,

143 or en+4 and en+5, by observing that any image of an Ri either does not intersect Rn and Rn+1 or covers both (and similarly for Rn+4 and Rn+5). Moreover, this subspace does not contain the vectors v1 = e1 + ··· + en − (en+1 + en+2) and v2 = Jnv1. These are two linearly independent vectors that both lie in the kernel of An; to see they lie in the kernel, observe that the two vectors are representing 1(−1,−1/2) − 1(−1/2,0) and the reflection 1(1/2,1) − 1(0,1/2), and the intervals stretch to the same image. All together, we now have a basis for C2n+4, every element of which is annihilated by n+1 n n+1 n An(An − 2An − 2Jn), and hence An(An − 2An − 2Jn) = 0.

2n+4 If x ∈ C , we say that x is symmetric when xi = x2n+5−i for all i, and antisym- + − metric when xi = −x2n+5−i for all i. Let E and E be the subspaces of symmetric and antisymmetric vectors in C2n+4, respectively. Recall, as well, that a matrix is diag- onalizable if and only if its minimal polynomial is separable; see, for example, Section 4 of [3].

+ Lemma 5.2.2. For all n ≥ 1, Jn is diagonalizable, with eigenspace E corresponding to the eigenvalue 1 and eigenspace E− corresponding to the eigenvalue −1. Moreover, ± the eigenspaces E are An-invariant.

2 Proof. We have Jn = I, so that (Jn − I)(Jn + I) = 0. Since Jn ± I 6= 0, we see that the minimal polynomial of Jn is

2 mJn (x) = x − 1 = (x − 1)(x + 1), so Jn is diagonalizable with eigenvalues ±1. The projections onto the eigenspaces E+1 and E−1 are given by

J + I J − I n = 1 (I + J ), n = 1 (I − J ), 1 + 1 2 n −1 − 1 2 n respectively, by normalizing the factor of the minimal polynomial that does not an- + nihilate the appropriate space. This fact immediately implies that E+1 = E and − + − E−1 = E . Finally, An and Jn commute, so for s ∈ E and a ∈ E we have

JnAns = AnJns = Ans, JnAna = AnJna = −Ana,

± thus showing that E are An-invariant.

n n For notation, for all n ≥ 1 let fn(x) = x (x − 2) − 2, gn(x) = x (x − 2) + 2, and n hn(x, y) = x (x − 2) − 2y.

Lemma 5.2.3. The polynomials fn and gn are irreducible over Q[x], separable with no roots at zero, and do not share any roots.

144 Proof. For irreducibility, apply Eisenstein’s Criterion with p = 2 in both cases, followed by Gauss’s Lemma. Since Q has characteristic zero, fn and gn are both separable (characteristic zero fields are perfect). Neither polynomial has 0 for a root (they both have non-zero constant terms), and since fn(x) = gn(x)−4, the two polynomials cannot share any roots.

Proposition 5.2.4. Let n ≥ 1. We have:

1. the kernel of An is ker(An) = spanC{v1 + Jnv1} ⊕ spanC{v1 − Jnv1}, for v1 = e1 + ··· + en − (en+1 + en+2);

+ 2. for s ∈ E , h(An,Jn)s = fn(An)s, and the minimal polynomial of An restricted + to E is xfn(x);

− 3. for a ∈ E , h(An,Jn)a = gn(An)a, and the minimal polynomial of An restricted − to E is xgn(x);

4. the minimal polynomial of An is

2n+2 2n+1 2n mAn (x) = xfn(x)gn(x) = x(x − 4x + 4x − 4);

5. the characteristic polynomial of An is χAn (x) = xmAn (x);

6. An is diagonalizable over C, with all eigenvectors corresponding to roots of fn being symmetric and all eigenvectors corresponding to roots of gn being antisym- metric.

Proof. First, we have already seen (in the proof of Lemma 5.2.1) that v1 and Jnv1 form a basis for the kernel of An, so v1 + Jnv1 and v1 − Jnv1 also form a basis of the kernel of An (one that conveniently splits into a symmetric and antisymmetric part). ± + − Observe that Jn restricted to E is ±I. Thus, we have, for s ∈ E and a ∈ E :

n+1 n hn(An,Jn)s = hn(An,I)s = (An − 2An − 2I)s = fn(An)s, n+1 n hn(An,Jn)a = hn(An, −I)a = (An − 2An + 2I)a = gn(An)a.

+ − Thus the minimal polynomial for An restricted to E and to E are factors of xfn(x) and xgn(x), respectively, because Anhn(An,Jn) = 0. Then, since

An(en+2 ± Jnen+2) 6= 0,

fn(An)(v1 + Jnv1) 6= 0, gn(An)(v1 − Jnv1) 6= 0,

+ − we see that the minimal polynomials for An restricted to E and E must be exactly 2n+4 + − equal to xfn(x) and xgn(x). Then the minimal polynomial for An on C = E ⊕E

145 ± is the lowest common multiple of the minimal polynomials for An on each subspace E , which means that mAn (x) = xfn(x)gn(x) (as fn and gn share no roots). The degree of mAn (x) is 2n + 3, but we know that the kernel is two-dimensional, so the characteristic polynomial must be χAn (x) = xmAn (x), since the degree of χAn (x) is exactly 2n + 4. Lastly, the minimal polynomial is separable by Lemma 5.2.3, so we see that An is diagonalizable. Since An and Jn commute, there is a basis of shared eigenvectors for An and Jn. If v is a non-kernel eigenvector for An, then as an eigenvector for Jn it is either + − + an element of E or E ; when Anv = λv for λ a root of fn, then v ∈ E since the + minimal polynomial for E is xfn(x) and λ is not a root of gn. Similarly, an eigenvector − corresponding to a root of gn is an element of E . The proof is complete.

We see that An has zero as an eigenvalue with multiplicity two and the non-zero eigenvalues of An are the roots of the polynomials fn and gn. We will now show that for n ≥ 5, both fn and gn have a real root near 2, one larger and one smaller respectively, and that all of the other roots are found near the unit circle. Figure 5.6 shows the roots of mAn for four different n.

Proposition 5.2.5. The polynomial fn has the spectral radius of An, 2 + 2κn, as a 1 root, and κn ∼ 2n . For n ≥ 5, the polynomial gn has a real root at 2 − 2rn < 2, with rn ∼ κn. For all n ≥ 1, all other roots of fn and gn are outside the circle of radius −1 1 − n , and for all n ≥ 6, all other roots of fn and gn are inside the circle of radius 1 + n−1.

Proof. Observe that substituting 2 + 2κn into fn yields:

n fn(2 + 2κn) = (2 + 2κn) (2 + 2κn − 2) − 2 n = 2 ((2 + 2κn) κn − 1) = 0, by the definition of κn. To see roughly how big κn is, observe that 1 1 1 1 1 > = κ > = · . 2n (2 + 2κ )n n 2 n 2n 1 n n 2 + 2n 1 + 2n

−n n −n Since (1 + 2 ) converges to 1 as n tends to infinity, we see that κn ∼ 2 . −n Next, observe that applying gn to 2(1 − 2 ) yields

 1 n  1  g (2(1 − 2−n)) = 2n 1 − 2 − 2 · − 2 + 2 n 2n 2n  1 n = −2 1 − + 2 > 0. 2n

−n −n Then, evaluate gn at 2(1 − (1 + 2n/2 ) · 2 ) and use Bernoulli’s inequality (used here

146 1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

(a) n=6 (b) n=11

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

(c) n=16 (d) n=29

Figure 5.6: Pictures of the roots of fn and gn for different values of n; roots of fn are marked with crosses, roots of gn are marked with circles, and the origin is marked with an asterisk (where An has a double eigen- value). The circle of radius 2 is a dashed line, the unit circle is a solid line, and the circles with radius 1 ± n−1 are dotted lines.

147 in the form (1 − x)n > 1 − nx for 0 < x < 1 and n ≥ 1):

   1 2n   1 2nn  1 2n g 2 1 − + = −2n+1 1 − + + + 2 n 2n 4n 2n 4n 2n 4n    n 2n2   2n ≤ 2 1 − 1 − + 1 + 2n 4n 2n 2n  4n 4n2  = −1 + + . 2n 2n 4n

The quantity inside the parentheses is decreasing for n ≥ 2 and is negative for n ≥ 5, −n so by continuity of gn there exists a root 2 − 2rn of gn for n ≥ 5, where 2 < rn < −n −n −n 2 + 2n4 . Thus rn ∼ 2 ∼ κn.

Lastly, we use Rouch´e’sTheorem to estimate the other roots of fn and gn. Set n −1 −1 a(z) = 2 and bn(z) = z (z − 2). For |z| = 1 − n , we have |z − 2| ≤ |z| + 2 = 3 + n and hence (noting that (1 − x)n < (1 + x)−n for 0 < x < 1 and n ≥ 1):

 1 n  1  |b (z)| = |z|n |z − 2| ≤ 1 − 3 + n n n 3 + 1/n 3 + 1 < n ≤ n = 2 = |a(z)| , (1 + 1/n) 1 + n where the last inequality came from the first two terms of the Binomial expansion.

We apply Rouch´e’sTheorem to see that a(z) and a(z) ± bn(z) = gn(z), −fn(z) (so −1 also fn(z)) have the same number of roots inside |z| = 1 − n : none, because a(z) is constant and therefore has no roots. On the other hand, for |z| = 1 + n−1, we have |z − 2| ≥ 2 − |z| = 1 − n−1, so again using the Binomial expansion (three terms, this time), we get:

 1 n  1  |b (z)| = |z|n |z − 2| ≥ 1 + 1 − n n n  n n(n − 1)  1  > 1 + + 1 − n 2n2 n 5 1   1  = − 1 − . 2 2n n

This last quantity is clearly increasing, and for n ≥ 6 it is larger than 2 = |a(z)|. Thus, for n ≥ 6, Rouch´e’sTheorem says that bn(z) and bn(z) ± a(z) = gn(z), fn(z) have the −1 same number of roots inside |z| = 1 + n : n, because the n + 1 roots of bn(z) are 0 with multiplicity n and 2 with multiplicity 1, and 2 is certainly outside the circle of radius 1 + n−1 if n ≥ 6.

Corollary 5.2.6. For all n ≥ 1, the spectral radius of An is 2(1 + κn), so the spectral radius of Mn is 1.

148 1

0.5

0 0 0.5 1

˜ Figure 5.7: The map Tκ, for κ = 0.3.

Proof. For n ≤ 5, one may use a computer to show that the only root of mAn (x) at least of magnitude 2 is 2 + 2κn. For n ≥ 6, we use Proposition 5.2.5 to conclude that the largest eigenvalue is 2 + 2κn. The spectrum of Mn is simply the spectrum of An −1 scaled by (2 + 2κn) , so the spectral radius of Mn is 1.

Remark 5.2.7. We identified Jn as a (2n + 4)-by-(2n + 4) permutation matrix with ones along the anti-diagonal. To add to our knowledge about Jn, we can also identify 2n+4 + 2 Jn as a 2-by-2 flip, by using tensor products: C ' E ⊗C C , and Jn is the flip in the second coordinate.

5.3 Spectral Properties of a Factor System

We may use our knowledge of Tκn to study a related system. Define π :[−1, 1] → [0, 1] ˜ by π(x) = |x| and set Tκ := π ◦ Tκ = |Tκ| : [0, 1] → [0, 1], as depicted in Figure 5.7. We can ask the same questions about this map: does it have an invariant density? Is it mixing, and if so with what rate? Instead of repeating all of our work, however, ˜ we can use the factor map relationship between Tκ and Tκ and the information about

Tκ to answer these questions, again by reducing the computations to painless matrix relations.

Observe that because Tκ is odd, for any x ∈ [−1, 1] we have that Tκ maps {±x} to {±Tκ(x)}. Looking at {±x} as an equivalence class under x ∼ −x, we see that the map π defined above collapses each class to a single point in [0, 1]: the shared absolute

149 ˜ value of the elements of the class. From the definition of Tκ, then, it is clear that ˜ π ◦ Tκ = Tκ ◦ π. ˜ ˜ In addition, for each n ≥ 1, Tn := Tκn is still Markov. The Markov partition is not quite the same as the partition on [0, 1] for Tn; the map is no longer monotonic on just two intervals, but rather four intervals. However, we can easily guess a partition; the 1 κ1 interval Rn+4 should be split in two. Note that 1 − κ1 = + and for all n ≥ 2, 2 2(1+κ1) T˜n−1(κ ) = 1 + κn ; this point is the zero of T˜ larger than 1/2 and the symmetric n n 2 2(1+κn) n point 1 − κn is the zero smaller than 1/2. 2 2(1+κn) ˜ Lemma 5.3.1. The map Tn is Markov for each n ≥ 1. For n = 1, the Markov partition is n 1  1  o (0, κ1) , κ1, 2 , 2 , 1 − κ1 (1 − κ1, 1) , and for n ≥ 2, the Markov partition is n       o (0, κ ) , κ , 1 − κn , 1 − κn , 1 , 1 , 1 + κn n n 2 2(1+κn) 2 2(1+κn) 2 2 2 2(1+κn) n   on−2 n   o ˜i+1 ˜i ˜ ∪ Tn (κn), Tn(κ) ∪ Tn(κn), 1 . i=1 The Markov partition has, in all cases, n + 3 intervals.

Proof. We again apply Lemma 5.1.3. For n = 1, the Markov partition is simply the ˜ ˜ (interiors of the) intervals of monotonicity, since T1(κ1) = 0 and T1(1/2) = κ1; there are 4 = 1 + 3 intervals. For n ≥ 2, the point 1 − κn is mapped to 0, and the 2 2(1+κn) remainder of the points are just as in the case of Tκn . Because we have split one of the intervals in [0, 1] in two (but are only considering [0, 1], not [−1, 1]), there are exactly n + 3 elements in the Markov partition.

The Markov partitions for n = 1, 4 are illustrated in Figure 5.8. We will denote the n+3 intervals in order left-to-right by {S1}1 . For n ≥ 4 the general form of the adjacency matrix Bn is as in Figure 5.9. For n ≤ 3 some of the columns are combined. Observe that Bn is almost identical to the bottom-right quadrant of An, with the exception of an extra column; this is expected, given that we modified the Markov partition by splitting Rn+4 into S2 and S3 while leaving the other intervals the same. ˜ We now compute the spectral data for Bn. Towards this goal, for notation let Vn 1 n+3 ˜ ˜ n+3 be the span of the functions { Si }i=1 , and let φ : Vn → C be the isomorphism ˜ 1 from Lemma 5.1.2, denoting di := φ( Si ). Our proof will run through the action of + 1 An on E ; for each i between 1 and n + 2, let si = 2 (en+3−i + en+2+i), and observe n+2 + n+2 n+2 that {si}i=1 is a basis for E . Then, call Cn : C → C the matrix representation + n+2 of An on E = spanC{si}i=1 . Moreover, let ι be the matrix representations of the + n+3 inclusion of E into C by splitting up s2 into d2 + d3, as pictured in Figure 5.10, so that ι(s1) = d1, ι(s2) = d3 + d4, and ι(sk) = dk+1 for k ≥ 3.

150 1 1

0.5 0.5

S S S S S S S 1 2 3 4 5 6 7 S S S S 1 2 3 4 0 0 0 0.5 1 0 0.5 1

(a) n=1 (b) n=4 ˜ Figure 5.8: Markov partitions for Tn, for n = 1, 4.

0 1 1 1 1 0 0 0 0 1 0 0 1 0 0 0   0 1 0 0 1 0 ... 0 0   0 1 0 0 1 0 0 0   0 1 0 0 0 1 0 0    . ..   . .    0 1 0 0 0 0 1 0   0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1

Figure 5.9: General form of the (n + 3)-by-(n + 3) adjacency matrix Bn.

1 0 0 0 0 1 0 0   0 1 0 0   0 0 1 0    ..   . 0 0 0 0 1

Figure 5.10: The (n + 3)-by-(n + 2) matrix ι, representing the inclusion E+ → Cn+3.

151 0 2 1 1 0 0 0 0 1 0 1 0 0 0   0 1 0 1 0 0 0    . .. .   . . .    0 1 0 0 1 0 0   0 1 0 0 0 1 0   0 1 0 0 0 0 1 1 0 0 0 0 0 1

+ Figure 5.11: The (n + 2)-by-(n + 2) matrix Cn, representing the action of An on E .

Proposition 5.3.2. Let n ≥ 1. We have:

1. ιCn = Bnι, with Cn as given in Figure 5.11;   2. the kernel of Bn is ker(Bn) = spanC{ d3 − d4 , d1 + d2 − (d5 + ··· + dn+3) };

3. the minimal polynomial for Bn is minBn (x) = xfn(x);

2 4. the characteristic polynomial for Bn is charBn (x) = x fn(x);

5. Bn is diagonalizable, the spectral radius of Bn is 2 + 2κn, and all of the other

eigenvalues of Bn are zero or near the unit circle. The eigenvector corresponding to the spectral radius is ι(w), where w ∈ E+ is the eigenvector corresponding to

the spectral radius for Cn.

+ Proof. First, note that the action of An restricted to E can be seen as identifying the vectors ei and e2n+5−i and looking at the action of An on the vectors en+3 to e2n+4, + n+2 1 because E = spanC{si}i=1 with si = 2 (en+3−i + en+2+i). The columns of the matrix An indicate the images under Tn of the intervals Ri for each i, and so the columns of the restriction Cn indicate the images under Tn of the intervals Rn+3 up to R2n+4 under the identification of Ri with R2n+5−i (considered with multiplicity). However, this is exactly what the columns of Bn indicate, because Bn is the adjacency matrix ˜ n+3 for Tn, taken with the refined partition {Si}i=1 . Since ι represents the refinement of the partition, we have ιCn = Bnι. From this equality we can obtain the remainder of the results in Proposition 5.3.2. ˜  Looking at Tn, it is clear that the kernel of Bn is equal to span { d3 − d4 , d1 +  C d2 −(d5 +···+dn+3) }, because these two vectors represent the two facets of symmetry ˜ in Tn (the symmetry in the long branches and the symmetry in the short branches).

To find the minimal polynomial for Bn, recall from Proposition 5.2.4 that the + minimal polynomial for An restricted to E is equal to xfn(x). Thus, we have

Bnfn(Bn)ι = ιCnfn(Cn) = 0,

152 + since Cn represents An acting on E and so satisfies the minimal polynomial. We also have that n+3 C = Im(ι) ⊕ spanC{d3 − d4}, because ι is injective (with rank n + 2) and d3 − d4 is not in the image of ι. Since

Bnfn(Bn) annihilates both the image of ι and d3 −d4 but fn(Bn)(d3 −d4) = −2(d3 −d4) and Bn(d1) 6= 0, we see that the minimal polynomial of Bn is mBn (x) = xfn(x). The 2 characteristic polynomial for Bn is χBn (x) = x fn(x), of course, because the kernel of

Bn is two-dimensional and the degree of χBn (x) is n + 3. Finally, the minimal polynomial for Bn is separable, so Bn is diagonalizable. By

Proposition 5.2.5, the largest eigenvalue of Bn is 2 + 2κn, and all other eigenvalues + of Bn are zero or near the unit circle (asymptotically). If w ∈ E is the eigenvector corresponding to 2 + 2κn for Cn, then

Bn(ι(w)) = ι(Cnw) = (2 + 2κn)ι(w), so ι(w) is the eigenvector for Bn corresponding to 2 + 2κn.

Observe that Bn shares no eigenvalues corresponding to the antisymmetric eigen- vectors for An; this makes sense, since the map π collapsed all of those vectors to 0, and we are left with the the symmetric eigenvectors. It did, however, introduce a new kernel vector, by introducing a new aspect of symmetry. In addition, note that we could not simply apply Lemma 5.1.2 with the matrix ˜ 2n+4 Cn, because Tn is not Markov with respect to the partition {Ri}i=n+3. However, the relationship between Cn and Bn allowed us to painlessly translate facts about An (and ˜ Cn) into facts about Bn, the actual adjacency matrix for Tn.

5.4 Mixing Rates and Times

We can now prove Theorem 5.0.1 and obtain the sharpness result for our estimate on the second-largest Lyapunov exponents for certain perturbations of paired tent maps, Theorem 4.6.1. This computation is reproduced as the following Corollary, and −1 describes how the second-largest eigenvalue (2 − 2rn)(2 + 2κn) for Mn approaches 1 as n tends to infinity.

Corollary 5.4.1. The second largest eigenvalue for Mn, and hence for Pn, is asymp- totically equivalent to 1 − 2κn.

Proof. For n ≥ 6, we know that the second largest eigenvalue in modulus for An is

2 − 2rn, so the second largest eigenvalue in modulus for Mn, and thus Pn (by Lemma −1 5.1.2), is (2 − 2rn)(2 + 2κn) . By Proposition 5.2.5, we have rn = κn + o(κn). Thus

153 we have:

2 − 2rn −1 = (1 − rn)(1 + κn) = (1 − κn + o(κn))(1 − κn + o(κn)) 2 + 2κn

= 1 − 2κn + o(κn).

Proof of Theorem 5.0.1. Using Lemma 5.1.2, we know that the two largest eigenvalues of Lκn are the two largest eigenvalues of Mn, and Corollary 5.4.1 says that while the largest eigenvalue is 1, the second largest is asymptotically 1 − 2κn. Then, to find the second-largest Lyapunov exponent for the map, we take the logarithm of the eigenvalue and use a standard asymptotic form:

log(1 − 2κn) ∼ −2κn.

We can also compute the mixing rate for the factor map system.

Corollary 5.4.2. The second-largest eigenvalue of the Perron-Frobenius operator for ˜ −1 −1 Tn has modulus at most (1 + n )(2 + 2κn) (for n ≥ 6), which is asymptotically 1 −1 equivalent to 2 (1 + n ).

Proof. The spectral radius of Bn is still 2 + 2κn, by Corollary 5.2.6 and Proposition ˜ 5.3.2, so the spectral radius of the Perron-Frobenius operator for Tn is 1 and the second- −1 largest eigenvalue has modulus at most 1 + n divided by 2 + 2κn, using Proposition −n 5.2.4 to get the upper bound. We then have (since κn ∼ 2 ):

1 + n−1 1  1  = 1 + (1 − κn + o(κn)) 2(1 + κn) 2 n 1  1  1  1  1  = 1 + − κ + o(κ ) = 1 + + o , 2 n n n 2 n n

−1 −1 1 −1 which shows that (1 + n )(2 + 2κn) is asymptotically equivalent to 2 (1 + n ). Related to our spectral pictures are the mixing times for the dynamical systems ˜ Tn and Tn. For Tn, we have shown that the second-largest eigenvalue of the Perron-

Frobenius Ln is approximately 1 − 2κn (Corollary 5.4.1 and Lemma 5.1.2). Thus the mixing time for Tn is, ignoring a scale factor, 1 1 ∼ ∼ 2n−1, |log(1 − 2κn)| 2κn

−n using the fact that κn ∼ 2 for large n. ˜ On the other hand, for Tn, Corollary 5.4.2 says that the second-largest eigenvalue −1 −1 of the Perron-Frobenius operator has modulus at most (1 + n )(2 + 2κn) , which for

154 1 −1 ˜ large n is approximately 2 (1 + n ). Thus the mixing time for Tn is (for n ≥ 6) 1 1 ∼ = O(1); 1 −1 |log(1/2) + n−1| log( 2 (1 + n )) much smaller than the mixing time for Tn.

This result matches our intuition: for Tn, taking the perturbation to zero (or n to infinity) leads to no mixing between the two halves, so the mixing time should tend to ˜ infinity, whereas for Tn, taking the perturbation to zero leads to a mixing tent map, and hence the mixing time should approach that for the unperturbed map. The difference in the orders of the mixing times indicates significant dynamical information about how these two systems are distinct; we obtained this information by performing calculations with matrices (without touching the matrices themselves) and some analysis of roots of polynomials. It is the hope that eventually, we will be able to find similar information for cocycles of operators in the random setting, but we are not there quite yet.

5.5 Simultaneous Spectrum via Algebraic Geometry

In the proof of Proposition 5.2.4, we computed the minimal polynomial for An by

finding invariant subspaces and working with the restrictions of An to those subspaces; n+1 n n+1 n the relation An(An −2An −2Jn) = 0 reduced to the relations An(An −2An ∓2I) = 0 on the subspaces E±, respectively, and these one-matrix relations yielded to standard 2 2 single-variable analysis. However, if k(y) = y −1, then we also had k(Jn) = Jn −I = 0, and we could consider the simultaneous equations

n+1 n Anhn(An,Jn) = An(An − 2An − 2Jn) = 0, 2 k(Jn) = Jn − I = 0.

Is it possible to take an algebraic-geometric approach to finding the eigenvalues of

An and Jn without reducing to the single-variable theory? The answer is, perhaps surprisingly, yes! We first define a suitable notion of spectrum for more than one matrix, collect some notation, and remind ourselves of Hilbert’s Nullstellesatz.

d Definition 5.5.1. Let A = (Aj)j=1 ∈ Mn(C) be a finite tuple of diagonalizable com- muting matrices, so that the matrices are simultaneously diagonalizable. We define the simultaneous spectrum of A to be the set of d-tuples S(A) ⊆ Cd given by

 d n S(A) = (λi)j=1 : there exists v ∈ C such that Ajv = λjv for all j .

If P is a collection of polynomials over C in d variables, then we denote the locus,

155 or zero set, of P by

 d d Z(P ) = (λj)j=1 ∈ C : f(λ1, . . . , λd) = 0 for all f ∈ P .

d d We denote the ideal in C[xj]j=1 generated by P by J (P ). If R is a subset of C , then d the ideal of polynomials in C[xj]j=1 that vanish on R is denoted by I(R). If A is a d tuple of commuting d matrices in Mn(C), then the ideal of polynomials in C[xj]j=1 that vanish when evaluated at (A1,...,Ad) is denoted by M(A)(M for “minimal”). If I is an ideal, then rad(I) denotes the radical of the ideal.

Theorem 5.5.2 (Hilbert’s Nullstellensatz [11, Section 15.2]). Let J be an ideal in C[x1, . . . , xd]. Then rad(J ) = I(Z(J )). Moreover, there is an order-reversing bijec- tion between subsets of Cd and the ideals of polynomials that vanish on those subsets. The result we will prove in this section is the following proposition; it is a replace- ment for Proposition 5.2.4, because we obtain the eigenvalues of An (and with them, the symmetry properties of the eigenvectors for An after identifying the eigenspaces for

Jn), and in the proof we use some of the more detailed information about the kernel.

Proposition 5.5.3. The simultaneous spectrum of An and Jn is the locus of the set of polynomials {xhn(x, y), k(y)}; that is, we have

S(An,Jn) = {(x, y): xhn(x, y) = 0 = k(y)} = Z(xhn(x, y), k(y)).

The proof will proceed via two lemmas; first, we give an indication of what holds generally, for arbitrary sets of polynomials that annihilate a tuple of commuting ma- trices.

d Lemma 5.5.4. Let A = (Aj)j=1 be a tuple of commuting matrices in MN (C), and let P be a subset of M(A).

N 1. If v ∈ C is a shared eigenvector for A with Ajv = λjv for Aj ∈ A and f ∈ C[x1, . . . , xd], then f(A1,...,Ad)v = f(λ1, . . . , λd)v.

2. We have J (P ) ⊆ M(A) ⊆ I(S(A)).

3. We have J (P ) ⊆ rad(J (P )) = I(Z(P )) ⊆ I(S(A)).

Moreover, if the matrices in A are all diagonalizable, then M(A) = I(S(A)).

N Proof. If v ∈ C is a shared eigenvector for A with eigenvalues λj corresponding to Aj ∈ A, then because the matrices in A commute, it is a straightforward calculation to see that f(A1,...,Ad)v = f(λ1, . . . , λd)v.

156 Next, for the second part of the lemma, since P is a subset of the ideal M(A), the ideal J (P ) generated by P is also a subset of M(A). Suppose that λ ∈ S(A) with corresponding shared eigenvector v and f ∈ M(A). By the first part of the lemma, we have:

f(λ)v = f(A1,...,Ad)v = 0, since f annihilates (A1,...,Ad). The vector v is non-zero, so f(λ) = 0. This fact holds for all λ ∈ S(A) and f ∈ M(A), so M(A) ⊆ I(S(A)). For the third part of the lemma, note that the radical of any ideal contains at least the ideal, so J (P ) ⊆ rad(J (P )). Hilbert’s Nullstellensatz provides the equality rad(J (P )) = I(Z(P )). If λ ∈ S(A) and f ∈ P , then by the second part of the lemma we see that f annihilates λ and so λ ∈ Z(P ), hence S(A) ⊆ Z(P ). Passing to the ideals of polynomials that vanish on those sets reverses the inclusion, so we obtain I(Z(P )) ⊆ I(S(A)). Finally, suppose that the matrices in A are all diagonalizable; then A is simulta- N neously diagonalizable and there is a basis of shared eigenvectors {vi}i=1 for A corre- sponding to λi ∈ S(A). Suppose that f ∈ I(S(A)); then for all i, we have

f(A1,...,Ad)vi = f(λi)vi = 0, since f vanishes at elements of the simultaneous spectrum. But then f(A1,...,Ad) annihilates every basis element, so that f(A1,...,Ad) = 0, i.e. f ∈ M(A).

Remark 5.5.5. The extent to which M(A) is a strict subset of I(S(A)) is the barrier to simultaneous diagonalizability. In the case of only one matrix A (so that A is the 1-tuple A), M(A) is the ideal generated by the minimal polynomial for A and I(S(A)) is the ideal generated by the monic separable polynomial with the eigenvalues of A for roots. The matrix A is diagonalizable if and only if its minimal polynomial is separable, if and only if M(A) and I(S(A)) are equal. For a set of polynomials P that annihilate a set of commuting matrices A, the extent to which I(Z(P )) is a strict subset of M(A) = I(S(A)) is the barrier to identifying the simultaneous spectrum of A as the locus of P ; strict inclusion of I(Z(P )) in I(S(A)) corresponds to strict inclusion of S(A) in Z(P ). This fact can be seen as P capturing only partial information about A unless P contains minimal relations for A. The case of A being only one matrix A is straightforward, even if A is not diagonalizable, because the spectrum of A is exactly the zero set of its minimal polynomial, which is also the zero set of the monic separable polynomial with the eigenvalues of A for roots; this latter polynomial is the generator for the radical of the ideal generated by the minimal polynomial. There is no general relation between I(Z(P )) and M(A) unless A is simultaneously diagonalizable, where we have I(Z(P )) ⊆ M(A) by Lemma 5.5.4.

157 Our goal in Proposition 5.5.3 is to show that the simultaneous spectrum of (An,Jn) is given by the locus of {xhn(x, y), k(y)}. We already know that these polynomials both annihilate (An,Jn), so in light of Lemma 5.5.4 and Remark 5.5.5, we must show that M(An,Jn) is contained in I(Z(xhn(x, y), k(y))). The next lemma is the key technical detail that allows us to do just that. For information on Gr¨obnerbases and Buchberger’s criterion, see [11, Section 9.6].

Lemma 5.5.6. We have M(An,Jn) ⊆ J (xhn(x, y), k(y)).

Proof. First, we show that {xhn(x, y), k(y)} is a reduced Gr¨obnerbasis for the ideal

J (xhn(x, y), k(y)), using Buchberger’s criterion. Let x > y generate the lexicographic monomial ordering and observe that the leading terms of xhn(x, y) and k(y) are n+2 2 LT (xhn) = x and LT (k) = y , respectively. The monic least common multiple m of xn+2 and y2 is xn+2y2, so we try to write m m S(xhn(x, y), k(y)) := xhn(x, y) − k(y) LT (xhn) LT (k) in terms of xhn(x, y) and k(y) without any monomial terms larger than the leading term xn+2:

m m 2 n+2 xhn(x, y) − k(y) = y xhn(x, y) − x k(y) LT (xhn) LT (k) = xn+2y2 − 2xn+1y2 − 2xy3 − xn+2y2 + xn+2 = xn+2 − 2xn+1y2 − 2xy3 n+1 n+1 2 3 = xhn(x, y) + 2x + 2xy − 2x y − 2xy n+1 = xhn(x, y) − (2x + 2xy)k(y).

Buchberger’s criterion applies, so {xhn(x, y), k(y)} is a Gr¨obnerbasis for the ideal

J (xhn(x, y), k(y)); it is reduced because each element has leading coefficient 1 and each leading term is not a multiple of the other leading term.

Suppose that f ∈ M(An,Jn). Since {xhn(x, y), k(y)} is a reduced Gr¨obnerbasis for J (xhn(x, y), k(y)), we may reduce f with respect to {xhn(x, y), k(y)} and obtain a unique remainder r(x, y) such that

f(x, y) = a(x, y)xhn(x, y) + b(x, y)k(y) + r(x, y) for some polynomials a and b and such that degx(r) ≤ n+1 and degy(r1) ≤ 1 (Theorem 23 in Section 9.6 of [11]). Then we know that f ∈ J (xhn(x, y), k(y)) if and only if the remainder r(x, y) is 0. For notation, let r(x, y) = c0 + c1y + xr1(x, y), for c0, c1 ∈ C and r1 ∈ C[x, y] with degx(r1) ≤ n and degy(r1) ≤ 1. Applying r(An,Jn) to the vector v = e1+. . . en−(en+1+en+2) ∈ ker(An)\{0}, we obtain c0v+c1Jnv = 0. The two vectors v and Jnv are linearly independent, so c0 = c1 = 0, and hence r(x, y) = xr1(x, y).

158 0 0 Now, we let v = en+3 + en+4 − (en+6 + ··· + e2n+4) and observe that An(v ) = en+2 0 by inspecting the graph of Tn. We also see that Anr1(An,Jn)v = r1(An,Jn)en+2, and by evaluating f at An and Jn we have that Anr1(An,Jn) = r(An,Jn) = 0. Putting these equations together, we obtain

n 1 0 X X i j 0 = Anr1(An,Jn)v = r1(An,Jn)en+2 = dijAnJnen+2, i=0 j=0 where the coefficients of r1 are the dij. Moreover, by recalling the computation in i j Lemma 5.2.1 we know that the set of AnJnen+2, with i between 0 and n and j equal to 0 or 1, is linearly independent. Therefore each of the dij are zero, so r1 is the zero polynomial, as is r. We now see that f ∈ J (xhn(x, y), k(y)), so the lemma is proved.

Proof of Proposition 5.5.3. We wish to use Lemma 5.5.4 in its full power, so we need to show that An and Jn are both diagonalizable. We have already shown that Jn is diagonalizable in Lemma 5.2.2. For An, observe that we have

n 2n 2 2 2 xhn(x, y)(x (x − 2) + 2y) + 4xk(y) = x(x (x − 2) − 4y ) + 4xy − 4x 2n 2 = x(x (x − 2) − 4) = xfn(x)gn(x), so that xfn(x)gn(x) is an element of J (An,Jn), hence Anfn(An)gn(An) = 0. The mini- mal polynomial mAn (x) therefore divides xfn(x)gn(x), but by Lemma 5.2.3 xfn(x)gn(x) is separable, so mAn (x) is also separable. Thus An is diagonalizable. We then combine Lemmas 5.5.4 and 5.5.6, setting P = {xhn(x, y), k(y)}, to see that

I(S(An,Jn)) = M(An,Jn)

⊆ J (xhn(x, y), k(y))

⊆ rad(J (xhn(x, y), k(y)))

= I(Z(xhn(x, y), k(y))) ⊆ I(S(An,Jn)), so all of these ideals are equal. By Hilbert’s Nullstellensatz, we have that

S(An,Jn) = Z(xhn(x, y), k(y)); that is, the simultaneous spectrum of An and Jn is exactly the locus of xhn(x, y) and k(y). The proof is complete.

We see, therefore, that the pairs of eigenvalues for An and Jn are exactly the locus of 2 {xhn(x, y), y −1}. When we solve for the roots of the two polynomials simultaneously,

159 we obtain the eigenvalues of An, and each eigenvalue is paired with 1 or −1; the parity of the second eigenvalue (for Jn) indicates the anti/symmetric property of the eigenvectors, since the eigenspaces for Jn are the even and the odd functions. The abstract algebraic geometry perspective is another way to see the problem; our initial proof was much less high-tech, but potentially more direct. The subsequent computations regarding the locations of the eigenvalues in Proposition 5.2.5 are not affected by the change, except that we would need to compute the minimal polynomials for An restricted to the eigenspaces of Jn; substituting y = ±1 into xhn(x, y) = 0, we see what the minimal polynomial is, so it is a quick fix.

5.6 Two-Parameter Markov Paired Tent Maps

It turns out that there is actually a countable family of Markov paired tent maps

Tκn,ζm , where κn and ζm are not necessarily equal. The proof is similar, but instead we have to consider the different images of ±1/2. Unfortunately, while we can still compute the minimal and characteristic polynomials for the matrix Mκn,ζm as in Section 5.2, the maps no longer have constant slope over all branches up to sign. Thus the representation of the Perron-Frobenius operator is not a constant multiple times the adjacency matrix, which prevents the remainder of the analysis being tractable. We can still do the computations, however, and for choices of parameters κn and ζm one can use numerical methods to compute the spectrum. If we have n, m, κ, ζ such that the following system of equations is satisfied, then by Lemma 5.1.3 we see that Tκ,ζ will be Markov:

n n Tκ,ζ (ζ) = 1 − (2(1 + κ)) ζ = 0, m m Tκ,ζ (κ) = 1 − (2(1 + ζ)) κ = 0.

We are just pushing forward ±1/2 under the map, and then applying the appropriate branch of Tκ,ζ until we hit zero, just as in Lemma 5.1.4.

2 2 Lemma 5.6.1. For each pair (n, m) ∈ Z≥1, there exists a pair (κn,m, ζn,m) ∈ (0, 1/2) such that κn,m tends to zero as m tends to infinity, ζn,m tends to zero as n tends to infinifty, and Tκn,ζm is Markov. Proof. Solve the above system of equations for ζ to see that:

1 1 c (κ) := = ζ = − 1 =: d (κ). n (2(1 + κ))n 2κ1/m m

The two functions of κ are plotted for specific n and m in Figure 5.12. −n −n We know that cn(κ) ∈ [4 , 2 ] and dm(κ) ∈ [−1/2, ∞), and some basic calculus −m indicates that both cn and dm are decreasing. Moreover, on the interval (0, 2 ] we

160 0.6

0.5

0.4

0.3

0.2

0.1

0 0.1 0.15 0.2 0.25

Figure 5.12: The steeper function (in blue) is dm(κ) and the flatter function (in red) is cn(κ), for m = 2 and n = 3. The x-coordinate of the intersection is the κ value corresponding to (n, m) and the y-coordinate is the ζ value. have −2n −n c0 (κ) = ≥ , n (2(1 + κ))n+1 2n −1 1 −2m d0 (κ) = · ≤ . m 2m κ1+1/m m For all n, m ≥ 1, we have that 2m/m > n/2n (since that is equivalent to 2n+m > nm, a true statement for all pairs of positive integers), and so we have that on (0, 2−m),

2m n (c − d )0(κ) ≥ − > 0. n m m 2n

−m −m Since cn − dm is negative for κ close to zero and positive at 2 (since dm(2 ) = 0), −m we see that there is a unique solution κn,m to cn(κ) = dm(κ) in (0, 2 ). There are no −m other solutions on (0, 1), because dm is negative and cn is positive for x > 2 . Let

ζn,m = cn(κn,m). Observe that because the equations are symmetric in κ and ζ with respect to n and m, we know that κn,m = cm(ζn,m). −m Since κn,m ∈ (0, 2 ), we know that as m tends to infinity, κn,m must tend to 0, −n and similarly ζn,m ∈ (0, 2 ) tends to 0 as n tends to infinity. By Lemma 5.1.3, since

κn,m and ζn,m are the images of ∓1/2, respectively, and they eventually are mapped to zero, we see that Tκn,m,ζn,m is Markov.

161 Write Tn,m, Ln,m, and Mn,m for Tκn,m,ζn,m , Lκn,m,ζn,m , and Mκn,m,ζn,m , where M is the matrix representation of the restriction of the Perron-Frobenius operator L to the finite-dimensional subspace determined by the Markov partition.

Lemma 5.6.2. Fix n, m ∈ Z≥1. The Markov partition for Tn,m is made up of two sets of intervals. For the intervals in [−1, 0], the partition contains the first n + 2 intervals of the partition from Lemma 5.1.4, using ζ and n; for the intervals in [0, 1], the partition contains the last m + 2 intervals from Lemma 5.1.4, using κ and m. The matrix Mn,m has a two-dimensional kernel, and has minimal polynomial

  1   1  κζ  m (x) = x xn+m x − x − − . n,m 1 + κ 1 + ζ (1 + κ)(1 + ζ)

Proof. The partition follows from Lemma 5.1.2 and Lemma 5.6.1, looking at the action of Tn,m. For the minimal polynomial, we follow the same procedure as in Lemma 5.2.1.

Since n and m are fixed, we will suppress them in the notation for M = Mn,m, κ = κn,m, and ζ = ζn,m. Letting e1, . . . , en+2, en+3, . . . , en+m+4 be the basis vectors ordered from left to right along [−1, 1] arising from the Markov partition and performing the same computations as in Lemma 5.2.1, we obtain:

1 M ne = (e + ··· + e ) = ζ(e + ··· + e ), n+2 (2(1 + κ))n 1 n+2 1 n+2 ζ 1 ζ M n+1e = (e + ··· + e ) = M ne + e , n+2 1 + κ 1 n+3 1 + κ n+2 1 + κ n+3 1 M me = (e + ··· + e ) = κ(e + ··· + e ), n+3 (2(1 + ζ))m n+3 n+m+2 n+3 n+m+2 κ 1 κ M m+1e = (e + ··· + e ) = M me + e . n+3 1 + ζ n+3 n+m+2 1 + ζ n+3 1 + ζ n+2

Combining these equations, we obtain:

 1   1  M m+1 − M m M n+1 − M n e 1 + ζ 1 + κ n+2  1   ζ  = M m+1 − M m e 1 + ζ 1 + κ n+3 ζ κ = · e , 1 + κ 1 + ζ n+2 which simplifies to

 2 + κ + ζ M n+m+2 − M n+m+1 (1 + ζ)(1 + κ) 1 κζ  + M n+m − I e = 0. (1 + ζ)(1 + κ) (1 + ζ)(1 + κ) n+2

162 i The same argument as in Lemma 5.2.1 says that the set of M en+2 for i = 0, . . . , n + m+1 is linearly independent, and factoring the above polynomial gives the polynomial written in the lemma statement, without the x term. Of course, the vectors e1 + ··· + en − (en+1 + en+2) and −(en+3 + en+4) + en+5 + . . . en+m+2 are still kernel vectors for M, so the kernel is two-dimensional, because the minimal polynomial of M with respect to en+2 is (n + m + 2)-dimensional and the space is (n + m + 4)-dimensional. The minimal polynomial of M is hence

  1   1  κζ  m (x) = x xn+m x − x − − . n,m 1 + κ 1 + ζ (1 + κ)(1 + ζ)

The last thing to note is that when n = m, we have κn,n = ζn,n, because the defining equations are the same as for the symmetric case in Lemma 5.1.4; we therefore see that

κn,n = κn is the same as in the symmetric case. Also recall that in the symmetric case, −1 Mn = (2(1 + κn)) An. In general, if m(x) is the minimal polynomial for some matrix A with degree k and c 6= 0 is a scalar, then ckm(x/c) is the minimal polynomial for cA: it is the monic polynomial that generates the ideal of polynomials which annihilate cA, since m generates the ideal of polynomials which annihilate A. We can apply this to −1 our case, with c = (2(1 + κn)) , to show that the minimal polynomial of Mn is equal to: 1 minMn (x) = 2n+3 mn (2(1 + κn)x) (2(1 + κn)) 2 κn 2n 2n 2  = 3 · 2(1 + κn)x (2(1 + κ)) x (2(1 + κ)x − 2) − 4 (2(1 + κn)) 2 2n  2 ! κnx x 2 1 = 2 2 (2(1 + κn)) x − − 4 (2(1 + κn)) κn 1 + κ  2 2 ! 2n 1 κn = x x x − − 2 = mn,n(x). 1 + κn (1 + κn)

Therefore we see that the minimal polynomial in the asymmetric case matches what it should be in the symmetric case.

163 Chapter 6 Conclusion

As a brief summary, after developing background and proving preliminary lemmas, we proved Theorem 2.4.3, a new generalized Perron-Frobenius theorem for cocycles of “positive” operators on a Banach space (ones that preserve and occasionally contract a cone). It is not a replacement for direct techniques, as the examples in Section 2.6 demonstrated, but it is powerful and allows for theoretical and asymptotic uses, as shown in Sections 4.5 and 4.6. The fact that any cocycle satisfying the hypotheses of the theorem is automatically quasi-compact makes the theorem a good tool for using Multiplicative Ergodic Theorems, especially because the two sets of hypotheses are the two main sets of measurability assumptions for the theory at the current time [17, 19]. We then developed a particular collection of results and theory for bounded vari- ation spaces, and proved a new Lasota-Yorke-type inequality that puts together two of the best aspects of the inequalities proved by Rychlik [48] and by Eslami and G´ora [12]: the balanced aspect and the freedom to choose auxiliary intervals, and the con- tributions of the hanging points, respectively. The inequality played a major role in allowing us to actually show that the Perron-Frobenius operators for cocycles of paired tent maps all preserved the same cone, because we could use it with different choices of auxiliary intervals (the J ) to obtain a uniform bound across an entire family of maps. The two previous results were applied to the cocycles of paired tent maps in Chapter 4, though we also needed to develop other theory in order to fruitfully apply the results. Theorem 4.5.1 was useful at a theoretical level, to show that all of these cocycles are quasi-compact, and Theorem 5.0.1 showed that there exists a linear upper bound for the second-largest Lyapunov exponent for a cocycle of perturbations. These are mixing results for the non-autonomous random dynamical system! The fact that the upper bound is linear in the perturbation parameter means that we have at least linear response in the second-largest Lyapunov exponent; as the perturbation parameter goes to zero, the second-largest Lyapunov exponent could tend to zero more slowly than linearly, but the rate is no faster than linear. We then performed a direct computation in Chapter 5 to show that the asymptotics for the perturbation are actually sharp, in a sense.

164 There are some directions in which to head after this work. How much farther can we stretch the application of Theorem 2.4.3? Are there more complex systems that can be shown to be quasi-compact with reasonable spectral gaps? Are there other types of Banach spaces on which transfer operators are well-behaved? (The answer to that is “yes” [4], but we would also need a cone to be preserved and contracted, so there is more work to do.) Can the beautiful Markov computations point us in the right direction to try to find a lower bound for λ2, in general (unlikely, but maybe it is tangential or adjacent to a better idea)? Is general Markov structure for a cocycle enough to say something about λ2 or smaller eigenvalues? Is it too restrictive to only consider a fixed cone that gets contracted (there are techniques where there is a sequence of cones which are mapped into one another)? What is the relationship between some of this work and open dynamical systems (see [10, 18] for the notion of meta-stable states and how they seem to relate to Lyapunov exponents)? What is the next step past the Perron-Frobenius theorem, in terms of handling smaller eigenvalues or smaller Lyapunov exponents? All of those questions are interesting to ask; hopefully they are interesting for someone to try to solve.

165 Appendix A Assorted Lemmas, Proofs, and Computations

A.1 Miscellaneous Ergodic Theory

Lemma A.1.1. Let A be an invertible matrix cocycle over some invertible ergodic probability-preserving transformation (Ω, µ, σ), suppose that log kA(1, ω)k is integrable, and let λ1, . . . , λd be the Lyapunov exponents for A, counted with multiplicity. Then

d X Z λk = log |det(A(1, ω))| dµ. k=1 Ω

Proof. We recall that the Lyapunov exponents are the logarithms of the eigenvalues of the limiting matrix of A(n, ω)T A(n, ω)1/2n (which are σ-invariant functions of ω, hence almost-everywhere constant), by the Multiplicative Ergodic Theorem (see [44] or [16]). Moreover, we can write

A(n, ω)T A(n, ω) = P (n, ω)D(n, ω)P (n, ω)T , where P (n, ω) is orthogonal, and D(n, ω) is the diagonal matrix of non-negative eigen- values rk(n, ω); taking the determinant of this matrix gives us:

d Y T  rk(n, ω) = det(D(n, ω)) = det P (n, ω)D(n, ω)P (n, ω) k=1 = det A(n, ω)T A(n, ω) = det A(1, ω)T ...A(1, σn−1(ω))T A(σn−1(ω)) ...A(1, ω) n Y = det(A(1, σj−1(ω)))2, j=1 by using properties of the determinant. The absolute value of the determinant of

166 A(1, ω) is bounded by d2(c kAk)d2 , where c > 0 is some scale factor that relates the norm on Mn(R) being used to the supremum norm of the coefficients, as all norms 2 are equivalent on Mn(R). Hence log |det(A(1, ω))| is bounded by 2 log(d) + d log(c) + d2 log kA(1, ω)k, which is integrable on Ω. Thus, we have:

d d !  1/2n X Y λk T  λk = log e = log lim det A(n, ω) A(n, ω) n→∞ k=1 k=1 1/2n 1/2n −1 = lim log det P (n, ω)diag(r1(n, ω) , . . . , rd(n, ω) )P (n, ω) n→∞ 1/2n 1/2n  = lim log det diag(r1(n, ω) , . . . , rd(n, ω) ) n→∞ d !  d !1/2 Y 1/2n 1 Y = lim log rk(n, ω) = lim log  rk(n, ω)  n→∞ n→∞ n k=1 k=1 n n 1 Y k−1 1 X j = lim log det(A(σ (ω))) = lim log det(A(σ (ω))) n→∞ n n→∞ n j=1 j=1 Z = log |det(A(ω))| dµ, Ω where the last equality is by Birkhoff’s theorem.

Lemma A.1.2. Let (Ω, B, µ, σ) be an invertible ergodic probability-preserving trans- 1 n formation and let f :Ω → R be measurable. Then lim supn→∞ n f ◦ σ ≥ 0 on a set of + ( 1 n full measure in Ω. Moreover, if f ∈ L µ), then lim supn→∞ n f ◦ σ = 0. Proof. For the first part, find L > 0 such that f −1[−L, L] has positive measure; by Poincar´erecurrence, find a full measure subset G of f −1[−L, L] such that every point in G returns to f −1[−L, L] infinitely often. By invertibility and ergodicity of σ, we have that [ −n Ω1 = σ (G) n∈Z

nk has full measure. Given ω ∈ Ω1, there exists a sequence nk such that σ (ω) ∈ f −1[−L, L] for all k, and so

f(σnk (ω)) −L ≥ −→ 0. nk nk k→∞

1 n Hence lim supn→∞ n f ◦ σ ≥ 0 on Ω1.

167 + 1 If f ∈ L (µ), then by Birkhoff’s theorem, for µ-almost every ω ∈ Ω1 we have:

n n−1 ! f +(σn(ω)) n + 1 1 X 1 X lim = lim f +(σi(ω)) − f +(σi(ω)) n→∞ n n→∞ n n + 1 n i=0 i=0 Z Z = f + dµ − f + dµ = 0. Ω Ω Hence we have 1 1 0 ≤ lim sup f ◦ σn ≤ lim sup f + ◦ σn = 0 n→∞ n n→∞ n for µ-almost every ω ∈ Ω, as required.

Lemma A.1.3. Let (Ω, B, µ, σ) be an ergodic probability-preserving transformation, let G ∈ B have positive measure, and let N :Ω → Z≥1 ∪ {∞} be the first entry time of ω into G: N(ω) = inf {n ≥ 1 : σn(ω) ∈ G}. Then N is measurable and for µ-almost every ω, N(σn(ω))/n −→ 0. n→∞

Proof. For k ∈ Z≥1, we have that

k−1 [ N −1{k} = σ−k(G) \ σ−i(G), i=1 which is clearly measurable. Then the complement of the union of these sets is N −1{∞}, so that N is measurable. Now, by Poincar´eRecurrence, let G˜ ⊆ G be the set of points in G that return ˜ S∞ −i ˜ ˜ infinitely often to G under σ. Then Ω = n=1 σ (G) has measure 1 and for ω ∈ Ω, we see that N(ω) is finite and σN(ω)(ω) ∈ G. Let k(n) be the number of times that σi(ω)

nG is in G, for 1 ≤ i ≤ n, let nG denote the first return time to G, and let σG = σ be the first return map. Then, for n ≥ N(ω), we have:

k(n)−1 n X j N(ω) n + N(σ (ω)) = N(ω) + nG(σG(σ (ω))). j=0

This equality holds because the first entry time is the same as the first return time for points in G˜. Applying Birkhoff’s theorem, we see that for almost every ω ∈ Ω,˜

n k(n) 1 X i −1 = χG(σ (ω)) −→ µ(σ (G)) = µ(G). n n n→∞ i=1

Applying Birkhoff’s theorem and Kac’s lemma to the first return map system (σG, µG)

168 gives us k(n)−1 Z 1 X j N(ω) 1 nG(σ (σ (ω))) −→ nG dµG = . k(n) G n→∞ µ(G) j=0 G Thus, we see that

k(n)−1 n + N(σn(ω)) N(ω) k(n) 1 X 1 ≤ = + n (σj (σN(ω)(ω))) n n n k(n) G G j=0 1 −→ 0 + µ(G) · = 1. n→∞ µ(G)

Hence N(σn(ω))/n converges to 0 as n tends to infinity, for µ-almost every ω. Lemma A.1.4. Let (Ω, B, µ, σ) be an ergodic probability-preserving transformation, n let N :Ω → Z≥1 be such that (N ◦ σ )/n converges to 0 µ-almost everywhere, let −1 Pn−1 i h :Ω → R be measurable, and suppose that n i=0 h(σ (ω)) converges to λ ∈ R for µ-almost every ω. Then

N(σn(ω))−1 1 X h(σi+n(ω)) −→ 0. n n→∞ i=0

Proof. Observe that for any ω and n, we have:

n−1 N(σn(ω))−1 1 X 1 X h(σi(ω)) + h(σi+n(ω)) n n i=0 i=0 n+N(σn(ω)) n + N(σn(ω)) 1 X = · h(σi(ω)). n n + N(σn(ω)) i=0

Since N(σn(ω))/n tends to 0, we see that the desired term converges to 0, because the two classical Birkhoff sums both converge to λ and (n + N(σn(ω)))/n converges to 1, by Lemma A.1.3. Lemma A.1.5. Let (Ω, B, µ, σ) be an ergodic probability-preserving transformation, and let f :Ω → R be a measurable function. Suppose that for µ-almost every ω, the n−1 1 X Birkhoff sums f(σi(ω)) converge to λ ∈ . n R i=0 1. If f ≥ 0, then f is integrable.

2. If f + is integrable, then f is integrable. Proof. First, assume that f ≥ 0, and suppose for contradiction that f is not integrable. −1 R R Let AM = f [0,M]; then f dµ ≤ M, but lim f dµ = ∞. Let  > 0, and AM M→∞ AM

169 R find A = AM such that A f dµ > λ + . By choice of M, f · χA is integrable, so by Birkhoff’s theorem and the hypothesis, we see that for µ-almost every ω ∈ Ω,

n−1 n−1 1 X 1 X 1 (f · (1 − χ ))(σi(ω)) = f(σi(ω)) − (f · χ )(σi(ω)) n A n Pn−1 A i=0 i=0 i=0 Z −→ λ − f dµ < λ − (λ + ) = − < 0. n→∞ A

But f ≥ 0, so we arrive at a contradiction, because the Birkhoff sum for f · (1 − χA) must be non-negative. The only assumption was that f is not integrable, so f must be integrable. Now, suppose that f takes negative values on a set of positive measure, and that f + is integrable. Writing f = f + − f −, we have

n−1 n−1 n−1 1 X 1 X 1 X Z f −(σi(ω)) = f +(σi(ω)) − f(σi(ω)) −→ f dµ − λ ∈ [0, ∞) n n n n→∞ −1 i=0 i=0 i=0 f [0,∞) for µ-almost every ω, again by Birkhoff and the hypothesis. We apply the previous part to see that f − is integrable, so f must also be integrable.

A.2 Miscellaneous Tools

Lemma A.2.1. We have the following asymptotic properties:

1. if f, g, h : (0, t) → R \{0} are functions with limx→0+ h(x)f(x) = L and g(x) ∼ f(x), then limx→0+ h(x)g(x) = L also;

2. if f(x) −→ 0, then log(1 + f(x)) ∼ f(x); x→0+ 3. if f > 0, f(x) −→ 0, and b > 0, then x→0+

1 log(tanh( (b − 2 log(f(x))))) ∼ −2e−b/2f(x); 4

4. for all b > 0, we have limx→0+ log(x) log(1 + bx) = 0.

Proof. For the first property, we simply compute:

g(x) g(x) lim h(x)g(x) = lim h(x)f(x) = lim h(x)f(x) · lim = L · 1 = L. x→0+ x→0+ f(x) x→0+ x→0+ f(x)

170 Next, we note that for |y| < 1, we have the power series representation

∞ ∞ X yn X yn log(1 + y) = = y + . n n n=1 n=2

Thus, taking y → 0 we obtain

∞ log(1 + y) X yn−1 lim = lim 1 + = 1. y→0 y y→0 n n=2

Substituting f(x) in for y yields the property. For the third part, let b > 0 and f(x) −→ 0. let y = 1 (b−2 log(f(x))), and expand x→0+ 4 log(tanh(y)) using the definition of tanh(y):

ey − e−y  1 − e−2y  log(tanh(y)) = log = log ey + e−y 1 + e−2y = log(1 − e−2y) − log(1 + e−2y).

Substituting in for y and using part (b) gives us:    1  1 (2 log(f(x))−b)  1 (2 log(f(x))−b) log tanh (a − log(f(x))) = log 1 − e 2 − log 1 − e 2 4 = log 1 − e−b/2f(x) − log 1 + e−b/2f(x) ∼ −e−b/2f(x) − e−b/2f(x) = −2e−b/2f(x), which is the desired result. Finally, for b > 0, we know by the second part that log(1 + bx) ∼ bx. For u → ∞, we compute u u u ue−u = = ≤ −→ 0, eu 1 + u + u2/2 + ... 1 + u + u2/2 u→∞ using the power series representation of eu, so we have, setting x = e−u,

lim bx log(x) = −b lim ue−u = 0. x→0+ u→∞

Thus by the first part of the lemma, lim log(x) log(1 + bx) = lim bx log(x) = 0. x→0+ x→0+ Lemma A.2.2. For 0 < λ ≤ µ < ∞ and t ≥ 0, we have that

(µ − λ)et 1 µ ≤ tanh log . (et + λ)(et + µ) 4 λ

Proof. If λ = µ, then both sides of the inequality are 0, so it is true. Assume that

171 λ < µ. First, we add zero in two locations to get

(µ − λ)et et et λ µ µ λ = − = 1 − − 1 + = − . (et + λ)(et + µ) et + λ et + µ et + λ et + µ et + µ et + λ

Call this resulting function g(t). Maximizing g(t) for non-negative t is equivalent to maximizing f(x) = g(log(x)) for x ≥ 1, so we investigate

µ λ f(x) = − . x + µ x + λ

We have λ µ f 0(x) = − , (x + λ)2 (x + µ)2 and f 0(x) = 0 if and only if (x + µ)2 µ = . (x + λ)2 λ Taking square roots and noting that x, µ, λ are all positive yields

x + µ rµ = , x + λ λ and this eventually rearranges to √ µ − µλ p x = p µ = µλ. λ − 1

To see if this is a maximum, rewrite f 0(x) as

1 1 f 0(x) = − . √ x 2 √ x 2 λ + √ µ + √ λ µ √ √ Setting f 0(x) > 0 and rearranging yields x < µλ. Similarly, f 0(x) < 0 yields x > µλ, √ hence x = µλ is indeed a maximum for f(x). Moreover, this value is a maximum for

172 all non-negative x, so we obtain:

(µ − λ)et   = g(t) = f(et) ≤ f pµλ (et + λ)(et + µ) √ √ µ λ µ − λ = √ − √ = √ √ µλ + µ µλ + λ µ + λ − 1 log µ 1 − e 2 ( λ ) = − 1 log µ 1 + e 2 ( λ ) 1 µ = tanh log , 4 λ where the last equality is an equivalent expression of tanh.

Lemma A.2.3. Let (M, B, µ) be a measure space. For each n ∈ Z≥1 ∪ {∞}, let fn : M → C be a measurable function. Then the following statements are equivalent:

1. fn −→ f∞ µ-almost everywhere; n→∞ ∞ ∞ ∞ \ [ \  1  2. the set C = (|f − f |)−1 0, has full measure. n ∞ m m=1 N=1 n=N

Proof. Suppose that fn −→ f∞ µ-almost everywhere. The definition of µ-almost n→∞ everywhere convergence is that there is a set of full measure Ω1 ⊆ Ω such that for every ω ∈ Ω1, for all  > 0 there is an N(ω, ) such that for all n ≥ N(ω, ), we have

|fn(ω) − f∞(ω)| < .

Fix ω ∈ Ω1. Then for any m ≥ 1, there is an N such that for all n ≥ N, ω ∈ −1  1  1 |fn − f∞| 0, m ; here, m takes the role of . We obtain

∞ ∞ ∞ \ [ \  1  ω ∈ (|f − f |)−1 0, , n ∞ m m=1 N=1 n=N and so Ω1 ⊆ C; hence C has full measure. Conversely, if C has full measure, fix ω ∈ C and choose  > 0. Find m ≥ 1 such 1 that m ≤ , and observe that there exists some N1 ≥ 1 such that

∞ ∞ \  1  \ ω ∈ (|f − f |)−1 0, ⊆ (|f − f |)−1 [0, ] . n ∞ m n ∞ n=N1 n=N1

Thus for all n ≥ N1, |fn(ω) − f∞(ω)| ≤ , so we see that C is a set of full measure on which the criterion for µ-almost everywhere convergence is satisfied. Thus the two conditions are equivalent.

173 Corollary A.2.4. Suppose that (M, B, µ) is a probability space and let fn : M → R≥0 be a sequence of measurable functions converging µ-almost everywhere to the constant 0 D > 0. Then for each δ ∈ (0, 1] and D ∈ [0,D), there exists an N ∈ Z≥1 such that the set ∞ \ −1 0 fn [D , ∞) n=N has measure at least 1 − δ. ∞ ∞ ∞ \ [ \  1  Proof. By Lemma A.2.3, the set C = (|f − f |)−1 0, has full mea- n ∞ m m=1 N=1 n=N ∞ ∞ [ \  1  sure, which implies that each (|f − f |)−1 0, also has full measure. Find n ∞ m N=1 n=N ∞ −1 0 \ −1  −1 m0 such that m0 ≤ D − D . Noting that (|fn − f∞|) 0, m0 is an increasing n=N chain of sets, find N such that this set has measure at least 1 − δ. Then we have:

∞ ! ∞ ! \ −1 0 \ −1  −1 µ fn [D , ∞) ≥ µ (|fn − D|) 0, m0 ≥ 1 − δ, n=N n=N

−1 −1 0 because |fn − D| ≤ m0 implies fn ≥ D − m0 ≥ D .

Lemma A.2.5. Let (X, τ) be a regular (not necessarily T0) space. Suppose that for any closed set C ⊆ X there exists a countable collection of Borel measurable sets {VC,j}j, each containing an open set containing C, such that if xν is a net that is eventually in every VC,j, then every accumulation point of xν is in C. If fn : (Ω, A) → X is Borel measurable and fn(ω) → f(ω) for all ω ∈ Ω, then f : (Ω, A) → X is Borel measurable. Proof. Let C ⊆ X be closed. We will show that

∞ ∞ ∞ −1 \ [ \ −1 f (C) = fn (VC,j). j=1 N=1 n=N

First, suppose that f(ω) ∈ C. Then fn(ω) is eventually in each VC,j, since VC,j contains ∞ ∞ ∞ \ [ \ −1 an open set containing C, so ω ∈ fn (VC,j). Conversely, suppose that ω is j=1 N=1 n=N T∞ S∞ T∞ −1 in j=1 N=1 n=N fn (VC,j). Then because f(ω) is the limit of fn(ω) and fn(ω) must accumulate in C, we see that f(ω) ∈ C. Corollary A.2.6. If (X, d) is a metric space, then (X, d) satisfies the hypotheses in Lemma A.2.5.

Proof. Let C ⊆ X be closed, and let C = {x ∈ X : d(x, C) < }, where d(x, C) is the standard distance of x from the set C. Each C is open, hence Borel measurable,

174 and contains C. Set VC,j = Cj−1 for each j ≥ 1. Let xν be a net in X that is eventually in each VC,j (X is first-countable, so xν might as well be a sequence), and let y∈ / C. −1 1 Find j ≥ 1 such that j ≤ 2 d(y, C) (since C is closed), and find N such that for all −1 ν ≥ N, xν ∈ Cj−1 . For each ν ≥ N, find zν ∈ C such that d(zν, xν) < j ; then we have 1 1 d(x , y) ≥ d(y, z ) − d(x , z ) ≥ d(y, C) − d(y, C) = d(y, C). ν ν ν ν 2 2

Thus xν cannot accumulate to y, so xν can only accumulate to points in C.

A.3 Miscellaneous Examples

Example A.3.1. Not every piecewise C1 map T , as defined in Example 3.2.5, has |T 0|−1 with finite variation on intervals of monotonicity. For instance, consider the following example (pulled from [38]). Let h : [0, 1] → R be given by h(0) = 1 and h(x) = 1 + x |sin(π/x)| for x ∈ (0, 1]. The function h is easily seen to be continuous on R 1 [0, 1]. Define T : [0, 1] → [0, 1] by setting c = 0 h dλ (where λ is normalized Lebesgue measure) and 1 Z x T (x) = h dλ. c 0 Then T is a C1 map, where T 0 = c−1h ≥ c−1 uniformly in x, so that T is monotone on [0, 1]. However, T 0 is not of bounded variation. To see this (unfortunate) fact, consider 4p−4 n 1 o the partition 2p+1/2−i/2 , and compute: i=0

4p−4     X 1 1 h − h 2p + 1/2 − i/2 2p + 1/2 − (1 + i)/2 i=1 4p−4 X 1 = |sin (π(2p + 1/2 − i/2))| 2p + 1/2 − i/2 i=1

1 − |sin (π(2p + 1/2 − (1 + i)/2))| 2p + 1/2 − (1 + i)/2 4p−4 X 1 1 = δi=even − δi=odd 2p + 1/2 − i/2 2p + 1/2 − (1 + i)/2 i=1 4p−4 4p−4 4p X 1 X 1 X 2 = ≥ = . 2p + 1/2 − bi/2c 2p + 1/2 − i/2 j i=1 i=1 j=5

This last sum diverges as p tends to infinity, so h does not have finite variation, hence T 0 also does not have finite variation. Then, from the calculation in Example 3.2.5, we

175 have:

k k X X 1 |g(x ) − g(x )| = |T 0(x ) − T 0(x )| i i−1 |T 0(x )| |T 0(x )| i i−1 i=1 i=1 i i−1 k c2 X ≥ |T 0(x ) − T 0(x )| . 4 i i−1 i=1

Since T 0 does not have finite variation, g cannot have finite variation either (here, we used the fact that h was bounded on [0, 1] by 2).

Example A.3.2. We use the techniques from Chapter 5 to approach Example 4.4.2 using the Markov map technology. Recall that we had the rigid rotation on Z/dZ and maps Ti given by Ti(x) = ix (mod 1). The map Ti is Markov, with Markov partition i {((k − 1)/i, k/i)}k=1, and since the slopes are constant and the branches are onto, we compute the adjacency matrix to be the i-by-i all-ones matrix,   1 1 ... 1   1 1 1 Ai = . .  . . ..  .  1 1 1

By Lemma 5.1.2 and the fact that the kernel of Ai is (i − 1)-dimensional, we see that the only element of the spectrum of Li outside of the is the leading −1 −1 eigenvalue for i Ai, which (of course) is 1. The essential spectral radius is i , and so in Example 4.4.2, the bound of −d−1 log(d!) for the second-largest Lyapunov exponent arises from taking an average over d steps and looking at the essential spectrum of Ad!.

176 Bibliography

[1] T. Andˆo. On fundamental properties of a Banach space with a cone. Pacific J. Math., 12:1163–1169, 1962.

[2] L. Arnold, V. M. Gundlach, and L. Demetrius. Evolutionary formalism for prod- ucts of positive random matrices. Ann. Appl. Probab., 4(3):859–901, 1994.

[3] S. Axler. Down with determinants! Amer. Math. Monthly, 102(2):139–154, 1995.

[4] V. Baladi. The quest for the ultimate anisotropic Banach space. J. Stat. Phys., 166(3-4):525–557, 2017.

[5] G. Birkhoff. Extensions of Jentzsch’s theorem. Trans. Amer. Math. Soc., 85:219– 227, 1957.

[6] M. Blank and G. Keller. Random perturbations of chaotic dynamical systems: stability of the spectrum. Nonlinearity, 11(5):1351–1364, 1998.

[7] A. Boyarsky and P. G´ora. Laws of chaos. Probability and its Applications. Birkh¨auserBoston, Inc., Boston, MA, 1997. Invariant measures and dynamical systems in one dimension.

[8] J. Buzzi. Exponential decay of correlations for random Lasota-Yorke maps. Comm. Math. Phys., 208(1):25–54, 1999.

[9] J. Ding, Q. Du, and T. Y. Li. The spectral analysis of Frobenius-Perron operators. J. Math. Anal. Appl., 184(2):285–301, 1994.

[10] D. Dolgopyat and P. Wright. The diffusion coefficient for piecewise expanding maps of the interval with metastable states. Stoch. Dyn., 12(1):1150005, 13, 2012.

[11] D. S. Dummit and R. M. Foote. Abstract algebra. John Wiley & Sons, Inc., Hoboken, NJ, third edition, 2004.

[12] P. Eslami and P. G´ora. Stronger Lasota-Yorke inequality for one-dimensional piecewise expanding transformations. Proc. Amer. Math. Soc., 141(12):4249–4260, 2013.

177 [13] I. V. Evstigneev and S. A. Pirogov. Stochastic nonlinear Perron-Frobenius theo- rem. Positivity, 14(1):43–57, 2010.

[14] P. Ferrero and B. Schmitt. Produits al´eatoiresd’op´erateursmatrices de transfert. Probab. Theory Related Fields, 79(2):227–248, 1988.

[15] G. Frobenius. Uber¨ Matrizen aus nicht negativen Elementen. In Sitzungsberichte der K¨oniglichPreussischen Akademie der Wissenschaften, pages 456–477, 1912.

[16] G. Froyland, S. Lloyd, and A. Quas. Coherent structures and isolated spectrum for Perron-Frobenius cocycles. Ergodic Theory Dynam. Systems, 30(3):729–756, 2010.

[17] G. Froyland, S. Lloyd, and A. Quas. A semi-invertible Oseledets theorem with ap- plications to cocycles. Discrete Contin. Dyn. Syst., 33(9):3835– 3860, 2013.

[18] C. Gonz´alez-Tokman, B. R. Hunt, and P. Wright. Approximating invariant den- sities of metastable systems. Ergodic Theory Dynam. Systems, 31(5):1345–1361, 2011.

[19] C. Gonz´alez-Tokman and A. Quas. A semi-invertible operator Oseledets theorem. Ergodic Theory Dynam. Systems, 34(4):1230–1272, 2014.

[20] H. Hennion. Sur un th´eor`emespectral et son application aux noyaux lipchitziens. Proc. Amer. Math. Soc., 118(2):627–634, 1993.

[21] H. Hennion. Limit theorems for products of positive random matrices. Ann. Probab., 25(4):1545–1587, 1997.

[22] F. Hofbauer and G. Keller. Ergodic properties of invariant measures for piecewise monotonic transformations. Math. Z., 180(1):119–140, 1982.

[23] J. Horan. Asymptotics for the second-largest Lyapunov exponent for some Perron- Frobenius operator cocycles. Preprint: https://arxiv.org/abs/1910.12112.

[24] J. Horan. Dynamical spectrum via determinant-free linear algebra. Preprint: https://arxiv.org/abs/2001.06788.

[25] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge University Press, Cambridge, second edition, 2013.

[26] G. Keller. On the rate of convergence to equilibrium in one-dimensional systems. Comm. Math. Phys., 96(2):181–193, 1984.

178 [27] G. Keller and C. Liverani. Stability of the spectrum for transfer operators. Ann. Scuola Norm. Sup. Pisa Cl. Sci. (4), 28(1):141–152, 1999.

[28] M. G. Kre˘ınand M. A. Rutman. Linear operators leaving invariant a cone in a Banach space. Uspehi Matem. Nauk (N. S.), 3(1(23)):3–95, 1948.

[29] A. Lasota and J. A. Yorke. On the existence of invariant measures for piecewise monotonic transformations. Trans. Amer. Math. Soc., 186:481–488 (1974), 1973.

[30] D. A. Levin, Y. Peres, and E. L. Wilmer. Markov chains and mixing times. American Mathematical Society, Providence, RI, 2009. With a chapter by James G. Propp and David B. Wilson.

[31] Z. Lian and K. Lu. Lyapunov exponents and invariant manifolds for random dynamical systems in a Banach space. Mem. Amer. Math. Soc., 206(967):vi+106, 2010.

[32] Z. Lian and Y. Wang. On random linear dynamical systems in a Banach space. I. Multiplicative ergodic theorem and Krein-Rutman type theorems. Adv. Math., 312:374–424, 2017.

[33] C. Liverani. Decay of correlations. Ann. of Math. (2), 142(2):239–301, 1995.

[34] C. Liverani. Decay of correlations for piecewise expanding maps. J. Statist. Phys., 78(3-4):1111–1129, 1995.

[35] R. Ma˜n´e.Lyapounov exponents and stable manifolds for compact transformations. In Geometric dynamics (Rio de Janeiro, 1981), volume 1007 of Lecture Notes in Math., pages 522–577. Springer, Berlin, 1983.

[36] W. A. McWorter, Jr. and L. F. Meyers. Computing eigenvalues and eigenvectors without determinants. Math. Mag., 71(1):24–33, 1998.

[37] R. E. Megginson. An introduction to Banach space theory, volume 183 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1998.

[38] J.-P. Merx. A continuous function which is not of bounded variation, 2014. https://www.mathcounterexamples.net/a-continuous-function-which-is- not-of-bounded-variation/. Accessed online on February 20, 2020.

[39] J. Mierczy´nskiand W. Shen. Principal Lyapunov exponents and principal Floquet spaces of positive random dynamical systems. I. General theory. Trans. Amer. Math. Soc., 365(10):5329–5365, 2013.

[40] V. I. Oseledec. A multiplicative ergodic theorem. Characteristic Ljapunov, expo- nents of dynamical systems. Trudy Moskov. Mat. Obˇsˇc., 19:179–210, 1968.

179 [41] A. L. Peressini. Ordered topological vector spaces. Harper & Row, Publishers, New York-London, 1967.

[42] O. Perron. Zur Theorie der Matrices. Math. Ann., 64(2):248–263, 1907.

[43] R. R. Phelps. Support cones in Banach spaces and their applications. Advances in Math., 13:1–19, 1974.

[44] M. S. Raghunathan. A proof of Oseledec’s multiplicative ergodic theorem. Israel J. Math., 32(4):356–362, 1979.

[45] W. Rudin. Real and complex analysis. McGraw-Hill Book Co., New York, third edition, 1987.

[46] D. Ruelle. Characteristic exponents and invariant manifolds in Hilbert space. Ann. of Math. (2), 115(2):243–290, 1982.

[47] H. H. Rugh. Cones and gauges in complex spaces: spectral gaps and complex Perron-Frobenius theory. Ann. of Math. (2), 171(3):1707–1752, 2010.

[48] M. Rychlik. Bounded variation and invariant measures. Studia Math., 76(1):69–80, 1983.

[49] H. H. Schaefer. Topological vector spaces. Springer-Verlag, New York-Berlin, 1971. Third printing corrected, Graduate Texts in Mathematics, Vol. 3.

[50] P. Thieullen. Fibr´es dynamiques asymptotiquement compacts. Exposants de Lyapounov. Entropie. Dimension. Ann. Inst. H. Poincar´eAnal. Non Lin´eaire, 4(1):49–97, 1987.

[51] D. M. Zhuang. Bases of convex cones and Borwein’s proper efficiency. J. Optim. Theory Appl., 71(3):613–620, 1991.

180