Topology, Morphisms, and Randomness in the Space of Formal Languages David E

University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School 6-20-2005 Topology, Morphisms, and Randomness in the Space of Formal Languages David E. Kephart University of South Florida Follow this and additional works at: https://scholarcommons.usf.edu/etd Part of the American Studies Commons Scholar Commons Citation Kephart, David E., "Topology, Morphisms, and Randomness in the Space of Formal Languages" (2005). Graduate Theses and Dissertations. https://scholarcommons.usf.edu/etd/721 This Dissertation is brought to you for free and open access by the Graduate School at Scholar Commons. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Scholar Commons. For more information, please contact [email protected]. Topology, Morphisms, and Randomness in the Space of Formal Languages by David E. Kephart A dissertation submitted in partial fulfillment of the requirements for the degree of Doctorate of Philosophy Department of Mathematics College of Arts and Sciences University of South Florida Major Professor: Nata¸saJonoska, Ph.D. Gregory McColm, Ph.D. Arunava Mukherjea, Ph.D. Masahiko Saito, Ph.D. Stephen Suen, Ph.D. Date of Approval: June 20, 2005 Keywords: entropy, language norm, language distance, language space, pseudo-metric, symbolic dynamics °c Copyright 2005, David E. Kephart Dedication Lacking a reason, she said, why continue? In her calm question I gained purpose And now she requests I not mention her name! On such irony, life turns. Yes, I could name others, my brother Michael, It would be fitting, but not complete; to Her of long suffering and practical optimism, of Up-turned face and crystalline realism— I dedicate this to my wife, who gave me how and why to go on. Acknowledgments Not one word (or symbol!) of this dissertation could be conceived without the constant, one might say unrelenting support of Natasha Jonoska. Her insight into language theory, her awareness of the examples, counter-examples and interconnections within the field, her anticipation of the thousand practical details that attend every scientific endeavor, her rigorous criticism, her refusal to settle for less than what can actually be obtained from a subject, not to mention her frequent and telling suggestions: all of these were vital to the production of this work. I can only hope to have in some degree merited such unstinting aid. It has also been my great privilege to work with committee members of stature in the mathemat- ical community. I thanks them, each and severally, for having generously contributed their time and skills (not to mention patience) to reviewing and critiquing this dissertation, despite myriad other demands upon their crowded schedules. It has been my good fortune to have studied subjects vital to the production of this dissertation with Greg McColm, Arunava Mukherjea, and Masahiko Saito. Stephen Suen, though he may not know it, inspired some of my first forays into problems of discrete mathematics. And I thank Dewey Rundus, chairman of my committee, for taking on that obligation without a qualm. There are many other faculty members who have helped me and without whom I would have but a vague idea of what mathematics is all about. In particular, I thank Jogindar Ratti for his encour- agement over the years, Brian Curtin for numerous enlightening discussions, Evguenii Rakhmanov for a comprehensive vision of complex analysis and approximation theory, and also Edwin Clark for bringing Normalized Google Distance to my attention. I have also incurred a deep debt to Scott Rimbey, who has rescued me on numerous occasions when health and my naivete´ led me into conflicts with my teaching obligations. The past five years at the University of South Florida have been remarkable because I have studied with a class of graduate students of truly imposing ability. There is not space for every name, but my special thanks goes to Edgardo Cureg both for his assistance with the calculations required in Chapter Six and for many other favors along the way, to Michiru Shibata, who has assisted me both with English and schedule conflicts, and to Luis Camara, the first student math speaker I heard on the campus. But I must give a special credit and appreciation to Kalpana Mahalingam, due to whose tenacity I first entered the remarkable world of language theory. Like her advisor, she knew what was best for me. A final word of thanks to Aya Ossowski, who not only informed me of the due dates and format requirements, but assisted me enthusiastically in meeting them. All shortcomings which remain in this document, and I am uncomfortably aware that they exist, may be credited to myself alone. They are to be regarded as errors to which I have somehow clung despite everything Natasha, and everyone else, has done on my behalf. The research for this dissertation was sponsored by the National Science Foundation, through grants CCA #0432009 and EIA #0086015. Table of Contents List of Tables . iii List of Figures . iv Abstract . v Chapter 1 Formalizing distance and topology on a space of languages . 1 1.1 Are languages the support of a meaningful distance? . 1 1.2 Basic definitions . 5 1.2.1 General notation . 5 1.2.2 Languages and language families . 6 1.2.3 Language spaces . 9 1.3 Morphisms on words and language spaces . 10 1.3.1 Extending literal mappings and word morphisms to language morphisms . 11 1.3.2 Language Space Isomorphisms . 20 1.3.3 The permutative set difference of languages . 22 1.4 Language pseudo-metrics and language norms . 24 1.4.1 Definition of language pseudo-metric and language norm . 25 1.4.2 Link between language norms and language pseudo-metrics . 26 1.5 Random languages . 29 1.5.1 Martin-Lof¨ randomness tests . 31 1.5.2 A general approach to randomness in topological language spaces . 33 Chapter 2 Formal language space as a Cantor space . 35 2.1 The Cantor language norm and distance . 36 2.1.1 Definition . 36 2.1.2 Language cylinder sets and their enumeration . 38 2.2 The Cantor topology on a language space . 41 2.3 Randomness under Cantor distance . 42 2.3.1 An upper-semi-computable measure on open sets . 42 2.3.2 Examples of nonrandom languages . 46 2.3.3 Random equals non-recursively enumerable . 47 Chapter 3 The Besicovitch language pseudo-metric . 50 3.1 A Besicovitch pseudo-metric on language spaces . 53 3.2 The Besicovitch language norm is surjective . 56 3.3 Besicovitch distance quotient space . 58 3.3.1 The Besicovitch distance equivalence relation and induced quotient space . 59 3.3.2 The metric quotient topology . 61 3.3.3 Convergence in the quotient space . 64 i 3.3.4 The quotient space is perfect, but not compact . 64 3.3.5 An upper quotient space, homeomorphic to the unit interval . 69 3.4 The geometry of the Besicovitch topology . 76 3.4.1 A point in the quotient space . 78 3.4.2 Open and closed neighborhoods of languages . 79 3.4.3 Ideals and the elements of the upper quotient space . 81 3.5 The Chomsky hierarchy revisited . 83 3.5.1 The finite languages are not dense . 83 3.5.2 All locally testable languages have norm zero . 84 3.5.3 Regular languages . 86 3.5.4 Non-RE languages . 87 3.6 Random Languages under the Besicovitch pseudo-metric topology . 88 Chapter 4 The entropic pseudo-metric . 92 4.1 Definition of entropy and entropic distance . 93 4.1.1 Entropy: the rate of exponential language growth . 94 4.1.2 The entropic language norm is surjective. 96 4.2 The entropic quotient space . 97 4.2.1 Definition of the entropic quotient spaces . 98 4.2.2 A point in the quotient space . 100 4.2.3 Elements of the upper quotient space . 103 4.3 Entropy and the Chomsky hierarchy . 108 4.3.1 The finite languages . 108 4.3.2 Locally testable languages . 109 4.4 Randomness in the entropy topology . 110 4.4.1 Separability . 110 4.4.2 Measure . 111 4.4.3 Randomness tests . 111 4.4.4 Random languages . 111 Chapter 5 Language morphisms and continuity . 113 5.1 The continuity of morphisms . 114 5.2 The pre-image of a section of words . 117 5.2.1 The word symbol matrices and symbol-length vectors of morphisms . 118 5.2.2 Distribution of the morphic image of a language space . 121 Chapter 6 Conclusions regarding distances between language; what is to be done . 124 6.1 Conclusions regarding the Cantor, Besicovitch, and entropy topologies . 124 6.2 Tasks yet to be finished . 127 6.2.1 Distinguishing regular languages metrically . 128 6.2.2 Distances between grammars . 129 6.2.3 Algebraic topology on the space of languages . 129 6.2.4 The dL metric . 130 6.3 Conclusion . 131 References . 134 About the Author . End Page ii List of Tables 1 Language distances between small languages . 126 2 Language distances between large languages . 127 iii List of Figures 1 Isometry between Besicovitch language space and quotient space . 62 2 The Besicovitch quotient spaces. 70 3 The relationship of the Besicovitch language space, quotient space, and unit interval (upper quotient space). 77 4 Word in a generally testable language . 84 5 The quotient spaces of the entropic distance. 106 iv Topology, Morphisms, and Randomness in the Space of Formal Languages David E. Kephart ABSTRACT This paper outlines and implements a systematic approach to the establishment, investigation, and testing of distances and topologies on language spaces. The collection of all languages over a given number of symbols forms a semiring, appropriately termed a language space.

Load more