
<p>Catching Words in a Stream of Speech: </p><p>Computational Simulations of <br>Segmenting Transcribed Child-Directed Speech </p><p></p><ul style="display: flex;"><li style="flex:1">˘</li><li style="flex:1">Çagrı Çöltekin </li></ul><p></p><p>CLCG </p><p>The work presented here was carried out under the auspices of the School of Behavioural and </p><p>Cognitive Neuroscience and the Center for Language and Cognition Groningen of the Faculty of </p><p>Arts of the University of Groningen. </p><p>Groningen Dissertations in Linguistics 97 ISSN 0928-0030 </p><p>2011, Çag˘rı Çöltekin </p><p>This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. To view a </p><p>copy of this license, visit <a href="/goto?url=http://creativecommons.org/licenses/by-nc-sa/3.0/" target="_blank">http://creativecommons.org/licenses/by-nc-sa/3.0/ </a></p><p>or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA. </p><p>Cover art, titled auto, by Franek Timur Çöltekin </p><p>A</p><p>Document prepared with LT X 2 and typeset by pdfT X </p><p>ε</p><p></p><ul style="display: flex;"><li style="flex:1">E</li><li style="flex:1">E</li></ul><p>Printed by Wöhrmann Print Service, Zutphen <br>RIJKSUNIVERSITEIT GRONINGEN </p><p>Catching Words in a Stream of Speech </p><p>Computational Simulations of Segmenting Transcribed Child-Directed Speech </p><p>Proefschrift ter verkrijging van het doctoraat in de <br>Letteren aan de Rijksuniversiteit Groningen op gezag van de <br>Rector Magnificus, dr. E. Sterken, in het openbaar te verdedigen op donderdag 8 december 2011 om 14.30 uur </p><p>door </p><p>Çag˘rı Çöltekin </p><p>geboren op 28 februari 1972 <br>Çıldır, Turkije </p><ul style="display: flex;"><li style="flex:1">Promotor: </li><li style="flex:1">Prof. dr. ir. J. Nerbonne </li></ul><p>Beoordelingscommissie: Prof. dr. A. van den Bosch <br>Prof. dr. P. Hendriks Prof. dr. P. Monaghan </p><p>ISBN (electronic version): 978-90-367-5259-6 </p><ul style="display: flex;"><li style="flex:1">ISBN (print version): </li><li style="flex:1">978-90-367-5232-9 </li></ul><p></p><p>Preface </p><p>I started my PhD project with a more ambitious goal than what might have been achieved in this dissertation. I wanted to touch most issues of language acquisition, developing computational models for a wider range of phenomena. In particular, I </p><p>wanted to focus on models of learning linguistic ‘structure’, as it is typically observed </p><p>in morphology and syntax. As a result, segmentation was one of the annoying tasks that </p><p>I could not easily step over because I was also interested in morphology. So, I decided </p><p>to write a chapter on segmentation. Despite the fact that segmentation is considered </p><p>relatively easy (in comparison to learning syntax, for example) by many people, and </p><p>it is studied relatively well, every step I took for modeling this task revealed another </p><p>interesting problem I could not just gloss over. At the end, the initial ‘chapter’ became </p><p>the dissertation you have in front of you. I believe I have a far better understanding of </p><p>the problem now, but I also have many more questions than what I started with. </p><p>The structure of the project, and my wanderings in the landscape of language acquisition did not allow me to work with many other people. As a result, this dissertation has been completed in a more independent setting than most other PhD dissertations. Nevertheless, this thesis benefited from my interactions with others. I </p><p>will try to acknowledge the direct or indirect help I had during this work, but it is likely </p><p>that I will fail to mention all. I apologize in advance to anyone whom I might have </p><p>unintentionally left out. </p><p>First of all, my sincere thanks goes to my supervisor John Nerbonne. Here, I do not use the word sincere for stylistic reasons. All PhD students acknowledge the </p><p>supervisor(s), but if you think they all mean it, you probably have not talked to many </p><p>of them. Besides the valuable comments on the content of my work, I got the attention </p><p>and encouragement I needed, when I needed it. He patiently read all my early drafts on </p><p>short notice, even correcting my never-ending English mistakes. </p><p>I would also like to thank Antal van den Bosch, Petra Hendriks and Padraic </p><p>Monaghan for agreeing to read and evaluate my thesis. Their comments and criticisms </p><p>improved the final draft, and made me look at the issues discussed in the thesis from </p><p>different perspectives. In earlier stages of my PhD project, I also received valuable comments and criticisms from Kees de Bot and Tamás Biró. Although focus of the project changed substantially, the benefit of their comments remain. Later, regular </p><p>discussions with fellow PhD students Barbara Plank, Dörte Hessler and Peter Nabende </p><p>vvi </p><p>kept me on track, and I particularly got valuable comments from Barbara on some of </p><p>the content presented here. Barbara and Dörte also get additional thanks for agreeing </p><p>to be my paranimphs during the defense. </p><p>A substantial part of completing a PhD requires writing. Writing at a reasonable </p><p>academic level is a difficult task, writing in a foreign language is even more difficult and </p><p>writing in a language in which you barely understand the basics is almost impossible. </p><p>First, thanks to Daniël de Kok for helping me with the impossible, and translating the </p><p>summary to Dutch. Second, I shamelessly used some people close to me for proof reading, on short notice, without much display of appreciation. My many mistakes </p><p>in earlier drafts of this dissertation were eliminated by the help of Joanna Krawczyk- </p><p>Çöltekin, Arzu Çöltekin, Barbara Plank and Asena and Giray Devlet. I am grateful for </p><p>their help, as well as their friendship. </p><p>My interest in language and language acquisition that lead to this rather late PhD </p><p>goes back to my undergraduate years in Middle East Technical University. I am likely </p><p>to omit many people here because of many years past since. However, the help I got </p><p>and things that I learned from Cem Bozs¸ahin and Deniz Zeyrek are difficult to forget. I </p><p>am particularly grateful to Cem Bozs¸ahin for his encouragement and his patience in </p><p>supervising my many MSc thesis attempts. </p><p>This thesis also owes a lot to people that I cannot all name here. I would like to </p><p>thank to those who share their data, their code, and their wisdom. This thesis would not be possible without many freely available sources of information, tools and data that we </p><p>A</p><p>take for granted nowadays—just to name a few: GNU utilities, R, LT X, CHILDES. <br>E</p><p>There is more to life than research, and you realize it more if you move to a new </p><p>city. Many people made the life in Groningen more pleasant during my PhD time here. First, I feel fortunate to be in Alfa-informatica. Besides my colleagues in the </p><p>department, people of foreign guest club and international service desk of the university made my life more pleasant and less painful. I am reluctant to name people individually because of certainty of omissions. Nevertheless here is an incomplete list of people that </p><p>I feel lucky to have met during this time, sorted randomly: Ellen and John Nerbonne, </p><p>Kostadin Cholakov, Laura Fahnenbruck, Dörte Hessler, Ildikó Berzlánovich, Gisi </p><p>Cannizzaro, Tim Van de Cruys, Daniël de Kok, Martijn Wieling, Jelena Prokic´, Aysa </p><p>Arylova, Barbara Plank, Martin Meraner, Gideon Kotzé, Zhenya Markovskaya, Radek </p><p>Šimík, Jörg Tiedemann, Tal Caspi, Jelle Wouda. </p><p>My parents, Hos¸naz and Selçuk Çöltekin, have always been supportive, but also encouraged my curiosity from the very start. My interest in linguistics likely goes back to a description of Turkish morphology among my father’s notes. I still pursue </p><p>the same fascination I felt when I realized there was a neat explanation to something </p><p>I knew intuitively. My sister, Arzu, has always been there for me, not only as a supportive family member, but I also benefited a lot from our discussions about my work, sometimes making me feel that she should have been doing what I do. Lastly, many thanks to two people who suffered most from my work on the thesis by being </p><p>closest to me, Aska and Franek, for their help, support and patience. </p><p>Contents </p><p></p><ul style="display: flex;"><li style="flex:1">List of Abbreviations </li><li style="flex:1">xi </li></ul><p></p><ul style="display: flex;"><li style="flex:1">1</li><li style="flex:1">1</li></ul><p>2<br>Introduction </p><ul style="display: flex;"><li style="flex:1">The Problem of Language Acquisition </li><li style="flex:1">5</li></ul><p></p><p>2.1 The nature–nurture debate . . . . . . . . . . . . . . . . . . . . . . . <br>2.1.1 Difficulty of learning languages . . . . . . . . . . . . . . . . <br>68<br>2.1.2 How quick is quick enough? . . . . . . . . . . . . . . . . . . 10 2.1.3 Critical periods . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 <br>2.2 Theories of language acquisition . . . . . . . . . . . . . . . . . . . . 14 <br>2.2.1 Parametric theories . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.2 Connectionist models . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Usage based theories . . . . . . . . . . . . . . . . . . . . . . 18 2.2.4 Statistical models . . . . . . . . . . . . . . . . . . . . . . . . 19 <br>2.3 Summary and discussion . . . . . . . . . . . . . . . . . . . . . . . . 20 </p><p></p><ul style="display: flex;"><li style="flex:1">3</li><li style="flex:1">Formal Models </li><li style="flex:1">23 </li></ul><p></p><p>3.1 Formal models in science . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Computational learning theory . . . . . . . . . . . . . . . . . . . . . 25 <br>3.2.1 Chomsky hierarchy of languages . . . . . . . . . . . . . . . . 26 3.2.2 Identification in the limit . . . . . . . . . . . . . . . . . . . . 27 3.2.3 Probably approximately correct learning . . . . . . . . . . . . 29 <br>3.3 Learning theory and language acquisition . . . . . . . . . . . . . . . 31 <br>3.3.1 The class of natural languages and learnability . . . . . . . . 33 3.3.2 The input to the language learner . . . . . . . . . . . . . . . . 35 3.3.3 Are natural languages provably (un)learnable? . . . . . . . . 37 <br>3.4 What do computational models explain? . . . . . . . . . . . . . . . . 38 3.5 Computational simulations . . . . . . . . . . . . . . . . . . . . . . . 39 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 </p><p>vii viii </p><p>Contents </p><p></p><ul style="display: flex;"><li style="flex:1">4</li><li style="flex:1">Lexical Segmentation: An Introduction </li><li style="flex:1">45 </li></ul><p></p><p>4.1 Word recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2 Lexical segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 49 <br>4.2.1 Predictability and distributional regularities . . . . . . . . . . 50 4.2.2 Prosodic cues . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.3 Phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2.4 Pauses and utterance boundaries . . . . . . . . . . . . . . . . 55 4.2.5 Words that occur in isolation and short utterances . . . . . . . 55 4.2.6 Other cues for segmentation . . . . . . . . . . . . . . . . . . 56 <br>4.3 Segmentation cues: combination and comparison . . . . . . . . . . . 57 </p><p></p><ul style="display: flex;"><li style="flex:1">5</li><li style="flex:1">Computational Models of Segmentation </li><li style="flex:1">59 </li></ul><p></p><p>5.1 The input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.2 Processing strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 <br>5.2.1 Guessing boundaries . . . . . . . . . . . . . . . . . . . . . . 63 5.2.2 Recognition strategy . . . . . . . . . . . . . . . . . . . . . . 65 <br>5.3 The search space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 <br>5.3.1 Exact search using dynamic programming . . . . . . . . . . . 68 5.3.2 Approximate search procedures . . . . . . . . . . . . . . . . 69 5.3.3 Search for boundaries . . . . . . . . . . . . . . . . . . . . . 71 <br>5.4 Performance and evaluation . . . . . . . . . . . . . . . . . . . . . . . 71 5.5 Two reference models . . . . . . . . . . . . . . . . . . . . . . . . . . 76 <br>5.5.1 A random segmentation model . . . . . . . . . . . . . . . . . 76 5.5.2 A reference model using language-modeling strategy . . . . . 76 <br>5.6 Summary and discussion . . . . . . . . . . . . . . . . . . . . . . . . 82 </p><p></p><ul style="display: flex;"><li style="flex:1">6</li><li style="flex:1">Segmentation using Predictability Statistics </li><li style="flex:1">85 </li></ul><p></p><p>6.1 Measures of predictability for segmentation . . . . . . . . . . . . . . 86 <br>6.1.1 Boundary probability . . . . . . . . . . . . . . . . . . . . . . 86 6.1.2 Transitional probability . . . . . . . . . . . . . . . . . . . . . 88 6.1.3 Pointwise mutual information . . . . . . . . . . . . . . . . . 90 6.1.4 Successor variety . . . . . . . . . . . . . . . . . . . . . . . . 91 6.1.5 Boundary entropy . . . . . . . . . . . . . . . . . . . . . . . . 93 6.1.6 Effects of phoneme context . . . . . . . . . . . . . . . . . . . 94 6.1.7 Predicting the past . . . . . . . . . . . . . . . . . . . . . . . 98 6.1.8 Predictability measures: summary and discussion . . . . . . . 100 <br>6.2 A predictability based segmentation model . . . . . . . . . . . . . . . 102 <br>6.2.1 Peaks in unpredictability . . . . . . . . . . . . . . . . . . . . 102 6.2.2 Combining multiple measures and varying phoneme context . 106 6.2.3 Weighing the competence of the voters . . . . . . . . . . . . 109 6.2.4 Two sides of a peak . . . . . . . . . . . . . . . . . . . . . . . 111 6.2.5 Reducing redundancy . . . . . . . . . . . . . . . . . . . . . . 112 </p><p>Contents </p><p>ix </p><p>6.3 Summary and discussion . . . . . . . . . . . . . . . . . . . . . . . . 113 </p><p>Learning from Utterance Boundaries 115 </p><p>7.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7.2 Do utterance boundaries provide cues for word boundaries? . . . . . . 117 7.3 An unsupervised learner using utterance boundaries . . . . . . . . . . 119 7.4 Combining measures and cues . . . . . . . . . . . . . . . . . . . . . 122 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 </p><p>7</p><ul style="display: flex;"><li style="flex:1">8</li><li style="flex:1">Learning From Past Decisions </li><li style="flex:1">127 </li></ul><p></p><p>8.1 Using previously discovered words . . . . . . . . . . . . . . . . . . . 128 <br>8.1.1 What makes a good word? . . . . . . . . . . . . . . . . . . . 130 8.1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . 131 <br>8.2 Using known words for segmentation: a computational model . . . . 132 <br>8.2.1 The computational model . . . . . . . . . . . . . . . . . . . 132 8.2.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 134 <br>8.3 Making use of lexical stress . . . . . . . . . . . . . . . . . . . . . . . 136 <br>8.3.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.3.2 An analysis of stress patterns in the BR corpus . . . . . . . . 139 8.3.3 Segmentation using lexical stress . . . . . . . . . . . . . . . 141 <br>8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 </p><p></p><ul style="display: flex;"><li style="flex:1">9</li><li style="flex:1">An Overview of the Segmentation Strategies </li><li style="flex:1">147 </li></ul><p></p><p>9.1 The measures and the models . . . . . . . . . . . . . . . . . . . . . . 147 9.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 9.3 Making use of prior information . . . . . . . . . . . . . . . . . . . . 152 9.4 Variation in the input . . . . . . . . . . . . . . . . . . . . . . . . . . 155 9.5 Qualitative evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 157 9.6 The learning method . . . . . . . . . . . . . . . . . . . . . . . . . . 161 9.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 </p><p></p><ul style="display: flex;"><li style="flex:1">10 Conclusions </li><li style="flex:1">165 </li></ul><p>169 187 191 193 197 <br>Bibliography Short Summary Samenvatting in het Nederlands A The Corpora B Detailed Performance Results </p><p>List of Abbreviations </p><p>List of Abbreviations <br>ANN APS <br>Artificial Neural Network Argument from poverty of stimulus <br>17 35 </p><p>BF BP BR BTP <br>Boundary F-score Boundary precision Boundary recall <br>73 73 73 </p><ul style="display: flex;"><li style="flex:1">99 </li><li style="flex:1">Backwards or reverse TP </li></ul><p>CDS CHILDES <br>Child Directed Speech Child language data exchange system <br>61 46 </p><p></p><ul style="display: flex;"><li style="flex:1">H</li><li style="flex:1">Boundary entropy </li><li style="flex:1">94 </li></ul><p>IIL </p><p>Identification in the limit also used as ‘Identifiable 27 </p><p>in the limit’ <br>LAD LF LP <br>Language Acquisition Device Lexical (or word-type) f-score Lexical (or word-type) precision Lexical (or word-type) recall <br>673 73 </p><ul style="display: flex;"><li style="flex:1">73 </li><li style="flex:1">LR </li></ul><p>MDL MI <br>Minimum description length (Pointwise) Mutual Information <br>65 90 </p><p>PAC PM PUM <br>Probably approximately correct (learning) Predictability-based Segmentation model </p><p>Segmentation model that uses predictability and ut- 123 </p><p>terance boundaries <br>29 113 </p><p>PUWM </p><p>Segmentation model that uses predictabilty, utter- 135 </p><p>ance boundaries and know words </p><p>xi xii </p><p>List of Abbreviations </p><p>PUWSM PV </p><p>Segmentation model that uses all cues: predictabilty, 143 </p><p>utterance boundaries, known words and stress </p><ul style="display: flex;"><li style="flex:1">Predecessor Value </li><li style="flex:1">99 </li></ul><p></p><ul style="display: flex;"><li style="flex:1">76 </li><li style="flex:1">RM </li><li style="flex:1">Random Model </li></ul><p>SM SRN SV <br>Stress-based Segmentation model Simple recurrent network Successor Variety <br>142 18, 63 91 </p><p></p><ul style="display: flex;"><li style="flex:1">TP </li><li style="flex:1">Transitional probability </li><li style="flex:1">88 </li></ul><p>UG UM <br>Universal Grammar Utterance boundary-based Segmentation model <br>6123 </p><p></p><ul style="display: flex;"><li style="flex:1">VC dimension Vapnik-Chervonenkis dimension </li><li style="flex:1">30 </li></ul><p>WF WM WP WR <br>Word F-score Lexicon-based Segmentation model Word precision <br>73 132 73 </p><ul style="display: flex;"><li style="flex:1">Word recall </li><li style="flex:1">73 </li></ul><p></p><p>1 Introduction </p><p>A witty saying proves nothing. <br>Voltaire </p><p>The aim of language acquisition research is to understand how children learn languages spoken in their environment. This study contributes to this purpose by </p><p>investigating one of the first steps of the language acquisition process, the discovery of </p><p>words in the speech stream directed to children, by means of computational simulations. </p><p>We take words for granted, we identify them effortlessly when listening to others </p><p>speaking a language we understand, and we use them to construct utterances possibly </p><p>never uttered before. We learn the sound forms of the words, associate them with </p><p>meanings, discover how to use them appropriately in company of other words, and in </p><p>presence of different people. Despite apparent ease with which we process and learn </p><p>words, learning a proper set of words, a lexicon, to effectively communicate with our </p><p>environment is a challenging task. The challenge starts with identifying these words in </p><p>a continuous speech stream. Unlike written text where we typically put white spaces </p><p>between the words, the speech signal does not contain analogous reliable markers for </p><p>word boundaries. </p><p>A competent language user is aided, to some extent, by his/her knowledge of </p><p>words to extract them from a continuous stream: itisannoyingbutyouprobablycanfigure- </p><p>outthewordsinthissequence. However, at the beginning of their journey to becoming </p><p>competent speakers, children do not know the words in the language they are acquir- </p><p>ing. As a result they cannot make use of words. If this is not convincing, try to </p><p>locate the word boundaries in this sentence: eg˘ertürkçebilmiyorsanızbudizidekisözcük- </p><p>leribulmanızçokzor. This is approximately what happens when you hear an unfamiliar </p><p>language (in this case, Turkish). Without knowing the words of the input language, </p><p>discovering words in a continuous speech stream does not seem possible, which leads </p><p>to a chicken-and-egg problem. In spoken language, we are not as helpless as in the </p><p>written stream of letters. There are several acoustic cues that indicate word boundaries. </p><p>However, although these cues correlate with the boundaries, they are known to be </p><p>insufficient, noisy and sometimes in conflict with each other. Furthermore, the cues are </p><p>language dependent, that is, one needs to know the boundaries to learn when and how </p><p>1<br>2</p><p>Introduction </p><p>these cues correlate with the boundaries. We are back to the chicken-and-egg problem </p><p>again. </p><p>Fortunately, there are also some simple and general segmentation strategies that seem to work universally for all languages. A byproduct of the fact that the natural speech stream is formed by concatenating words, the flow of basic units (such as </p><p>syllables or phonemes) in an utterance follows certain statistical regularities. Particu- </p><p>larly, the basic units within words predict one another in sequence, while units across </p><p>boundaries do not. It is even more encouraging that children seem to be sensitive to </p><p>these statistics at a very young age. Another source of information for word boundaries </p><p>that does not require knowledge of words in advance comes from utterance bound- </p><p>aries. Utterance boundaries are also word boundaries, and words are formed by certain </p><p>regularities, for example they share common beginnings and endings. This provides </p><p>another source for discovering words before knowing them. Once we start discovering </p><p>words using these general strategies, we can also learn to use the language-specific </p>
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages215 Page
-
File Size-