Catching Words in a Stream of Speech

Catching Words in a Stream of Speech: Computational Simulations of Segmenting Transcribed Child-Directed Speech <ul style="display: flex;"><li style="flex:1">˘</li><li style="flex:1">Çagrı Çöltekin </li></ul>CLCG The work presented here was carried out under the auspices of the School of Behavioural and Cognitive Neuroscience and the Center for Language and Cognition Groningen of the Faculty of Arts of the University of Groningen. Groningen Dissertations in Linguistics 97 ISSN 0928-0030 2011, Çag˘rı Çöltekin This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. To view a copy of this license, visit <a href="/goto?url=http://creativecommons.org/licenses/by-nc-sa/3.0/" target="_blank">http://creativecommons.org/licenses/by-nc-sa/3.0/ </a>or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA. Cover art, titled auto, by Franek Timur Çöltekin ADocument prepared with LT X 2 and typeset by pdfT X ε<ul style="display: flex;"><li style="flex:1">E</li><li style="flex:1">E</li></ul>Printed by Wöhrmann Print Service, Zutphen RIJKSUNIVERSITEIT GRONINGEN Catching Words in a Stream of Speech Computational Simulations of Segmenting Transcribed Child-Directed Speech Proefschrift ter verkrijging van het doctoraat in de Letteren aan de Rijksuniversiteit Groningen op gezag van de Rector Magnificus, dr. E. Sterken, in het openbaar te verdedigen op donderdag 8 december 2011 om 14.30 uur door Çag˘rı Çöltekin geboren op 28 februari 1972 Çıldır, Turkije <ul style="display: flex;"><li style="flex:1">Promotor: </li><li style="flex:1">Prof. dr. ir. J. Nerbonne </li></ul>Beoordelingscommissie: Prof. dr. A. van den Bosch Prof. dr. P. Hendriks Prof. dr. P. Monaghan ISBN (electronic version): 978-90-367-5259-6 <ul style="display: flex;"><li style="flex:1">ISBN (print version): </li><li style="flex:1">978-90-367-5232-9 </li></ul>Preface I started my PhD project with a more ambitious goal than what might have been achieved in this dissertation. I wanted to touch most issues of language acquisition, developing computational models for a wider range of phenomena. In particular, I wanted to focus on models of learning linguistic ‘structure’, as it is typically observed in morphology and syntax. As a result, segmentation was one of the annoying tasks that I could not easily step over because I was also interested in morphology. So, I decided to write a chapter on segmentation. Despite the fact that segmentation is considered relatively easy (in comparison to learning syntax, for example) by many people, and it is studied relatively well, every step I took for modeling this task revealed another interesting problem I could not just gloss over. At the end, the initial ‘chapter’ became the dissertation you have in front of you. I believe I have a far better understanding of the problem now, but I also have many more questions than what I started with. The structure of the project, and my wanderings in the landscape of language acquisition did not allow me to work with many other people. As a result, this dissertation has been completed in a more independent setting than most other PhD dissertations. Nevertheless, this thesis benefited from my interactions with others. I will try to acknowledge the direct or indirect help I had during this work, but it is likely that I will fail to mention all. I apologize in advance to anyone whom I might have unintentionally left out. First of all, my sincere thanks goes to my supervisor John Nerbonne. Here, I do not use the word sincere for stylistic reasons. All PhD students acknowledge the supervisor(s), but if you think they all mean it, you probably have not talked to many of them. Besides the valuable comments on the content of my work, I got the attention and encouragement I needed, when I needed it. He patiently read all my early drafts on short notice, even correcting my never-ending English mistakes. I would also like to thank Antal van den Bosch, Petra Hendriks and Padraic Monaghan for agreeing to read and evaluate my thesis. Their comments and criticisms improved the final draft, and made me look at the issues discussed in the thesis from different perspectives. In earlier stages of my PhD project, I also received valuable comments and criticisms from Kees de Bot and Tamás Biró. Although focus of the project changed substantially, the benefit of their comments remain. Later, regular discussions with fellow PhD students Barbara Plank, Dörte Hessler and Peter Nabende vvi kept me on track, and I particularly got valuable comments from Barbara on some of the content presented here. Barbara and Dörte also get additional thanks for agreeing to be my paranimphs during the defense. A substantial part of completing a PhD requires writing. Writing at a reasonable academic level is a difficult task, writing in a foreign language is even more difficult and writing in a language in which you barely understand the basics is almost impossible. First, thanks to Daniël de Kok for helping me with the impossible, and translating the summary to Dutch. Second, I shamelessly used some people close to me for proof reading, on short notice, without much display of appreciation. My many mistakes in earlier drafts of this dissertation were eliminated by the help of Joanna Krawczyk- Çöltekin, Arzu Çöltekin, Barbara Plank and Asena and Giray Devlet. I am grateful for their help, as well as their friendship. My interest in language and language acquisition that lead to this rather late PhD goes back to my undergraduate years in Middle East Technical University. I am likely to omit many people here because of many years past since. However, the help I got and things that I learned from Cem Bozs¸ahin and Deniz Zeyrek are difficult to forget. I am particularly grateful to Cem Bozs¸ahin for his encouragement and his patience in supervising my many MSc thesis attempts. This thesis also owes a lot to people that I cannot all name here. I would like to thank to those who share their data, their code, and their wisdom. This thesis would not be possible without many freely available sources of information, tools and data that we Atake for granted nowadays—just to name a few: GNU utilities, R, LT X, CHILDES. EThere is more to life than research, and you realize it more if you move to a new city. Many people made the life in Groningen more pleasant during my PhD time here. First, I feel fortunate to be in Alfa-informatica. Besides my colleagues in the department, people of foreign guest club and international service desk of the university made my life more pleasant and less painful. I am reluctant to name people individually because of certainty of omissions. Nevertheless here is an incomplete list of people that I feel lucky to have met during this time, sorted randomly: Ellen and John Nerbonne, Kostadin Cholakov, Laura Fahnenbruck, Dörte Hessler, Ildikó Berzlánovich, Gisi Cannizzaro, Tim Van de Cruys, Daniël de Kok, Martijn Wieling, Jelena Prokic´, Aysa Arylova, Barbara Plank, Martin Meraner, Gideon Kotzé, Zhenya Markovskaya, Radek Šimík, Jörg Tiedemann, Tal Caspi, Jelle Wouda. My parents, Hos¸naz and Selçuk Çöltekin, have always been supportive, but also encouraged my curiosity from the very start. My interest in linguistics likely goes back to a description of Turkish morphology among my father’s notes. I still pursue the same fascination I felt when I realized there was a neat explanation to something I knew intuitively. My sister, Arzu, has always been there for me, not only as a supportive family member, but I also benefited a lot from our discussions about my work, sometimes making me feel that she should have been doing what I do. Lastly, many thanks to two people who suffered most from my work on the thesis by being closest to me, Aska and Franek, for their help, support and patience. Contents <ul style="display: flex;"><li style="flex:1">List of Abbreviations </li><li style="flex:1">xi </li></ul><ul style="display: flex;"><li style="flex:1">1</li><li style="flex:1">1</li></ul>2 Introduction <ul style="display: flex;"><li style="flex:1">The Problem of Language Acquisition </li><li style="flex:1">5</li></ul>2.1 The nature–nurture debate . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Difficulty of learning languages . . . . . . . . . . . . . . . . 68 2.1.2 How quick is quick enough? . . . . . . . . . . . . . . . . . . 10 2.1.3 Critical periods . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Theories of language acquisition . . . . . . . . . . . . . . . . . . . . 14 2.2.1 Parametric theories . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.2 Connectionist models . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Usage based theories . . . . . . . . . . . . . . . . . . . . . . 18 2.2.4 Statistical models . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Summary and discussion . . . . . . . . . . . . . . . . . . . . . . . . 20 <ul style="display: flex;"><li style="flex:1">3</li><li style="flex:1">Formal Models </li><li style="flex:1">23 </li></ul>3.1 Formal models in science . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Computational learning theory . . . . . . . . . . . . . . . . . . . . . 25 3.2.1 Chomsky hierarchy of languages . . . . . . . . . . . . . . . . 26 3.2.2 Identification in the limit . . . . . . . . . . . . . . . . . . . . 27 3.2.3 Probably approximately correct learning . . . . . . . . . . . . 29 3.3 Learning theory and language acquisition . . . . . . . . . . . . . . . 31 3.3.1 The class of natural languages and learnability . . . . . . . . 33 3.3.2 The input to the language learner . . . . . . . . . . . . . . . . 35 3.3.3 Are natural languages provably (un)learnable? . . . . . . . . 37 3.4 What do computational models explain? . . . . . . . . . . . . . . . . 38 3.5 Computational simulations . . . . . . . . . . . . . . . . . . . . . . . 39 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 vii viii Contents <ul style="display: flex;"><li style="flex:1">4</li><li style="flex:1">Lexical Segmentation: An Introduction </li><li style="flex:1">45 </li></ul>4.1 Word recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2 Lexical segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.1 Predictability and distributional regularities . . . . . . . . . . 50 4.2.2 Prosodic cues . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.3 Phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2.4 Pauses and utterance boundaries . . . . . . . . . . . . . . . . 55 4.2.5 Words that occur in isolation and short utterances . . . . . . . 55 4.2.6 Other cues for segmentation . . . . . . . . . . . . . . . . . . 56 4.3 Segmentation cues: combination and comparison . . . . . . . . . . . 57 <ul style="display: flex;"><li style="flex:1">5</li><li style="flex:1">Computational Models of Segmentation </li><li style="flex:1">59 </li></ul>5.1 The input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.2 Processing strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.2.1 Guessing boundaries . . . . . . . . . . . . . . . . . . . . . . 63 5.2.2 Recognition strategy . . . . . . . . . . . . . . . . . . . . . . 65 5.3 The search space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.3.1 Exact search using dynamic programming . . . . . . . . . . . 68 5.3.2 Approximate search procedures . . . . . . . . . . . . . . . . 69 5.3.3 Search for boundaries . . . . . . . . . . . . . . . . . . . . . 71 5.4 Performance and evaluation . . . . . . . . . . . . . . . . . . . . . . . 71 5.5 Two reference models . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.5.1 A random segmentation model . . . . . . . . . . . . . . . . . 76 5.5.2 A reference model using language-modeling strategy . . . . . 76 5.6 Summary and discussion . . . . . . . . . . . . . . . . . . . . . . . . 82 <ul style="display: flex;"><li style="flex:1">6</li><li style="flex:1">Segmentation using Predictability Statistics </li><li style="flex:1">85 </li></ul>6.1 Measures of predictability for segmentation . . . . . . . . . . . . . . 86 6.1.1 Boundary probability . . . . . . . . . . . . . . . . . . . . . . 86 6.1.2 Transitional probability . . . . . . . . . . . . . . . . . . . . . 88 6.1.3 Pointwise mutual information . . . . . . . . . . . . . . . . . 90 6.1.4 Successor variety . . . . . . . . . . . . . . . . . . . . . . . . 91 6.1.5 Boundary entropy . . . . . . . . . . . . . . . . . . . . . . . . 93 6.1.6 Effects of phoneme context . . . . . . . . . . . . . . . . . . . 94 6.1.7 Predicting the past . . . . . . . . . . . . . . . . . . . . . . . 98 6.1.8 Predictability measures: summary and discussion . . . . . . . 100 6.2 A predictability based segmentation model . . . . . . . . . . . . . . . 102 6.2.1 Peaks in unpredictability . . . . . . . . . . . . . . . . . . . . 102 6.2.2 Combining multiple measures and varying phoneme context . 106 6.2.3 Weighing the competence of the voters . . . . . . . . . . . . 109 6.2.4 Two sides of a peak . . . . . . . . . . . . . . . . . . . . . . . 111 6.2.5 Reducing redundancy . . . . . . . . . . . . . . . . . . . . . . 112 Contents ix 6.3 Summary and discussion . . . . . . . . . . . . . . . . . . . . . . . . 113 Learning from Utterance Boundaries 115 7.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7.2 Do utterance boundaries provide cues for word boundaries? . . . . . . 117 7.3 An unsupervised learner using utterance boundaries . . . . . . . . . . 119 7.4 Combining measures and cues . . . . . . . . . . . . . . . . . . . . . 122 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7<ul style="display: flex;"><li style="flex:1">8</li><li style="flex:1">Learning From Past Decisions </li><li style="flex:1">127 </li></ul>8.1 Using previously discovered words . . . . . . . . . . . . . . . . . . . 128 8.1.1 What makes a good word? . . . . . . . . . . . . . . . . . . . 130 8.1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . 131 8.2 Using known words for segmentation: a computational model . . . . 132 8.2.1 The computational model . . . . . . . . . . . . . . . . . . . 132 8.2.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 134 8.3 Making use of lexical stress . . . . . . . . . . . . . . . . . . . . . . . 136 8.3.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.3.2 An analysis of stress patterns in the BR corpus . . . . . . . . 139 8.3.3 Segmentation using lexical stress . . . . . . . . . . . . . . . 141 8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 <ul style="display: flex;"><li style="flex:1">9</li><li style="flex:1">An Overview of the Segmentation Strategies </li><li style="flex:1">147 </li></ul>9.1 The measures and the models . . . . . . . . . . . . . . . . . . . . . . 147 9.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 9.3 Making use of prior information . . . . . . . . . . . . . . . . . . . . 152 9.4 Variation in the input . . . . . . . . . . . . . . . . . . . . . . . . . . 155 9.5 Qualitative evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 157 9.6 The learning method . . . . . . . . . . . . . . . . . . . . . . . . . . 161 9.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 <ul style="display: flex;"><li style="flex:1">10 Conclusions </li><li style="flex:1">165 </li></ul>169 187 191 193 197 Bibliography Short Summary Samenvatting in het Nederlands A The Corpora B Detailed Performance Results List of Abbreviations List of Abbreviations ANN APS Artificial Neural Network Argument from poverty of stimulus 17 35 BF BP BR BTP Boundary F-score Boundary precision Boundary recall 73 73 73 <ul style="display: flex;"><li style="flex:1">99 </li><li style="flex:1">Backwards or reverse TP </li></ul>CDS CHILDES Child Directed Speech Child language data exchange system 61 46 <ul style="display: flex;"><li style="flex:1">H</li><li style="flex:1">Boundary entropy </li><li style="flex:1">94 </li></ul>IIL Identification in the limit also used as ‘Identifiable 27 in the limit’ LAD LF LP Language Acquisition Device Lexical (or word-type) f-score Lexical (or word-type) precision Lexical (or word-type) recall 673 73 <ul style="display: flex;"><li style="flex:1">73 </li><li style="flex:1">LR </li></ul>MDL MI Minimum description length (Pointwise) Mutual Information 65 90 PAC PM PUM Probably approximately correct (learning) Predictability-based Segmentation model Segmentation model that uses predictability and ut- 123 terance boundaries 29 113 PUWM Segmentation model that uses predictabilty, utter- 135 ance boundaries and know words xi xii List of Abbreviations PUWSM PV Segmentation model that uses all cues: predictabilty, 143 utterance boundaries, known words and stress <ul style="display: flex;"><li style="flex:1">Predecessor Value </li><li style="flex:1">99 </li></ul><ul style="display: flex;"><li style="flex:1">76 </li><li style="flex:1">RM </li><li style="flex:1">Random Model </li></ul>SM SRN SV Stress-based Segmentation model Simple recurrent network Successor Variety 142 18, 63 91 <ul style="display: flex;"><li style="flex:1">TP </li><li style="flex:1">Transitional probability </li><li style="flex:1">88 </li></ul>UG UM Universal Grammar Utterance boundary-based Segmentation model 6123 <ul style="display: flex;"><li style="flex:1">VC dimension Vapnik-Chervonenkis dimension </li><li style="flex:1">30 </li></ul>WF WM WP WR Word F-score Lexicon-based Segmentation model Word precision 73 132 73 <ul style="display: flex;"><li style="flex:1">Word recall </li><li style="flex:1">73 </li></ul>1 Introduction A witty saying proves nothing. Voltaire The aim of language acquisition research is to understand how children learn languages spoken in their environment. This study contributes to this purpose by investigating one of the first steps of the language acquisition process, the discovery of words in the speech stream directed to children, by means of computational simulations. We take words for granted, we identify them effortlessly when listening to others speaking a language we understand, and we use them to construct utterances possibly never uttered before. We learn the sound forms of the words, associate them with meanings, discover how to use them appropriately in company of other words, and in presence of different people. Despite apparent ease with which we process and learn words, learning a proper set of words, a lexicon, to effectively communicate with our environment is a challenging task. The challenge starts with identifying these words in a continuous speech stream. Unlike written text where we typically put white spaces between the words, the speech signal does not contain analogous reliable markers for word boundaries. A competent language user is aided, to some extent, by his/her knowledge of words to extract them from a continuous stream: itisannoyingbutyouprobablycanfigure- outthewordsinthissequence. However, at the beginning of their journey to becoming competent speakers, children do not know the words in the language they are acquir- ing. As a result they cannot make use of words. If this is not convincing, try to locate the word boundaries in this sentence: eg˘ertürkçebilmiyorsanızbudizidekisözcük- leribulmanızçokzor. This is approximately what happens when you hear an unfamiliar language (in this case, Turkish). Without knowing the words of the input language, discovering words in a continuous speech stream does not seem possible, which leads to a chicken-and-egg problem. In spoken language, we are not as helpless as in the written stream of letters. There are several acoustic cues that indicate word boundaries. However, although these cues correlate with the boundaries, they are known to be insufficient, noisy and sometimes in conflict with each other. Furthermore, the cues are language dependent, that is, one needs to know the boundaries to learn when and how 1 2Introduction these cues correlate with the boundaries. We are back to the chicken-and-egg problem again. Fortunately, there are also some simple and general segmentation strategies that seem to work universally for all languages. A byproduct of the fact that the natural speech stream is formed by concatenating words, the flow of basic units (such as syllables or phonemes) in an utterance follows certain statistical regularities. Particu- larly, the basic units within words predict one another in sequence, while units across boundaries do not. It is even more encouraging that children seem to be sensitive to these statistics at a very young age. Another source of information for word boundaries that does not require knowledge of words in advance comes from utterance boundaries. Utterance boundaries are also word boundaries, and words are formed by certain regularities, for example they share common beginnings and endings. This provides another source for discovering words before knowing them. Once we start discovering words using these general strategies, we can also learn to use the language-specific

Catching Words in a Stream of Speech

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support