Biases in Segmenting Non-concatenative Morphology by Michelle Alison Fullwood B.A., Cornell University (2004) Submitted to the Department of Linguistics and Philosophy in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Linguistics at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2018 Massachusetts Institute of Technology 2018. All rights reserved. Signature redacted A u thor................................ Department of Linguistics and Philosophy September 1, 2018 Signature redacted C ertified by ........................ Adani'Albright Professor of Linguistics Thesis Supervisor Accepted by ................... ISignature redacted MASSACHUSETTS INSTITUTE ' dam Albright OF TECHNOLOGY Professor of Linguistics SEP 2 7 2018 Linguistics Section Head LIBRARIES ARCHIVES 2 Biases in Segmenting Non-concatenative Morphology by Michelle Alison Fullwood Submitted to the Department of Linguistics and Philosophy on September 1, 2018, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Linguistics Abstract Segmentation of words containing non-concatenative morphology into their com- ponent morphemes, such as Arabic /kita:b/ 'book' into root /ktb and vocalism /i-a:/ (McCarthy, 1979, 1981), is a difficult task due to the size of its search space of possibilities, which grows exponentially as word length increases, versus the linear growth that accompanies concatenative morphology. In this dissertation, I investigate via computational and typological simula- tions, as well as an artificial grammar experiment, the task of morphological seg- mentation in root-and-pattern languages, as well as the consequences for majority- concatenative languages such as English when we do not presuppose concatena- tive segmentation and its smaller hypothesis space. In particular, I examine the necessity and sufficiency conditions of three biases that may be hypothesised to govern the learning of such a segmentation: a bias to- wards a parsimonious morpheme lexicon with a power-law (Zipfian) distribution over tokens drawn from this lexicon, as has successfully been used in Bayesian models of word segmentation and morphological segmentation of concatenative languages (Goldwater et al., 2009; Poon et al., 2009, et seq.); a bias towards con- catenativity; and a bias against interleaving morphemes that are mixtures of con- sonants and vowels. I demonstrate that while computationally, the parsimony bias is sufficient to segment Arabic verbal stems into roots and residues, typological considerations argue for the existence of biases towards concatenativity and towards separat- ing consonants and vowels in root-and-pattern-style morphology. Further evi- dence for these as synchronic biases comes from the artificial grammar experiment, which demonstrates that languages respecting these biases have a small but signif- icant learnability advantage. Thesis Supervisor: Adam Albright Title: Professor of Linguistics 3 4 Acknowledgments I write these acknowledgments with minutes to go to the official deadline, as one does, but I've been composing these in my head long before I ever wrote a word of this dissertation, because it took the help and kindness of many different people to get me to the point where I could even write a word. So here goes: Thanks go first and foremost to Adam Albright, who has been my advisor from my first day at MIT. I'm not sure how someone's worst quality can be that they're too nice, but Adam has managed it. After hearing innumerable horror stories about other people's advisorsi, I've come to realise how lucky I am to have ex- perienced his kindness, guidance, and wide-ranging knowledge of the minutiae of so many languages. I can only hope that as I go on to supervise other people in my working life, I can be half as patient and understanding as he has been with me through the years. Thanks also to the other members of my committee, Donca Steriade and Ed- ward Flemming, who were always there when I needed them, and provided much advice on linguistics, statistics, teaching, and life along the way. Michael Kenstow- icz taught me so much about phonology through the classes I took from him and our one-on-one meetings. Tim O'Donnell was the one who started me on the project that grew into this dissertation, and I'm very grateful to him for teaching me almost everything I know about Bayesian statistics and their applications to language. I'll not hide that I burned out, pretty badly, for a couple of years while work- ing on my dissertation. I'd like to thank MIT Mental Health and Counselling for giving me the tools to work through my anxiety and depression, and to all of my professors for being understanding and patient during the dark times. I'd also like to thank in this regard everyone at department HQ, particularly Jen, Matt, Mary and Chrissy, for bailing me out when I missed deadlines because things felt too impossible, and for always being there with smiles and chocolate. To any very lost 1All from other departments, natch. 5 grad students who might stumble upon this dissertation, if sitting down to work feels incapacitating, please talk to someone. No one I know has enjoyed writing their dissertation, but it shouldn't be an object of terror. Back to the good times: I loved sharing an office with Suyeon Yun, Yusuke Imanishi, Hrayr Khanjian, and Amanda Swenson, and the inimitable members of ling-10: Ayaka Sugawara, Coppe van Urk, Gretchen Kern, Isaac Gould, Ryo Ma- suda, Sam Steddy, Ted Levin, and Wataru Uegaki. Thanks also go to my extended family of officemates in Masako Imanishi, Yuika-chan, Joetaro-kun, and Sally Yun. Another bright spot was TAing for Donca Steriade and David Pesetsky. Those courses, and the bright undergrads who participated in them, rekindled my love for linguistics when it was flagging. It may be a trope that the real treasure is the friends you made along the way, but I am very glad that my rocky journey through MIT led, directly or indirectly, to me knowing Abdul-Raza Sulemana, Alya Abbott, Anna Jurgensen, Benjamin Storme, Brian Hsu, Bronwyn Bjorkman, Chew Lin Kay, Chrissy Wheeler, Christine Riordan, Claire Halpert, Danfeng Wu, Despina Oikonomou, Diviya Sinha, Erin Olson, Eva Csipak, Fen Tung, Gaja Jarosz, Giorgio Magri, Grace Chua, Hadas Kotek, Ishani Guha, Janice Lee, Jiahao Chen, Jonah Katz, Laine Stranahan, Lau- ren Eby Clemens, Lilla Magyar, Louis Tee, Luka Crnic, Maria Giavazzi, Michelle Yuan, Milena Sisovics, mitcho Erlewine, Nina Topintzi, Pamela Pan, Patrick Grosz, Patrick Jones, Pooja Paul, Presley Pizzo, Pritty Patel-Grosz, Ruth Brillman, Sam Al Khatib, Sam Zukoff, Sarah Ouwayda, Sixun Chen, Snejana Iovtcheva, Sunghye Cho, Yang Lin, Yasutada Sudo, Yimei Xiang, Yoong Keok, Youngah Do, Yujing Huang, Yukiko Ueda, among many others. Thanks also to the many who, wittingly or unwittingly, helped me procrasti- nate on my dissertation and kept me sane: my puzzle hunt crew, Chelsea, Tim, Jit Hin, Ahmed, Fabian, Nicholas, and Eric; fellow organisers and members of Boston Python, PyLadies Boston, and the Python community at large, including Adam Palay, Lynn Cherny, Liza Daly, Alex Dorsk, Eric Ma, Frederick Wagner, Janet Ri- ley, Jennifer Konikowski, Jenny Cheng, Laura Stone, Leena, Lina Kim, Marianne 6 Foos, and Thao Nguyen; my fellow mapmakers at Maptime Boston, particularly Jake Wasserman and Ray Cha; and fellow volunteers at the Prison Book Program. Muchas gracias also to colleagues at Vista Higher Learning, Cobalt Speech, and Spirit Al, who not only helped me procrastinate but paid me to do it. Ever since I came to Boston, I've gone back home to Singapore once a year, where my friends have helped me decompress. I'd like to thank in particular the Cornell contingent (Yann Fang, Clara, Peishan, Pris, Ray, Wenshan), my CSIT "bros" (CKL, GKC, TYL, KKH, BS, OYS), and Janet. In Singapore also are my parents, to whom eternal thanks go for simply ev- erything. They had to leave school at 16 and 18 to start working and supporting their families, but that didn't stop them from having a start-up before start-ups were ever a thing, or from producing two daughters with PhDs either. Thank you for always ensuring we had opportunities to read and educate ourselves, while at the same time showing through your own example that you don't need higher education to be wise, and that you definitely don't need a degree to be loved. To my sister Melissa, I send absolutely no thanks at all for finishing her PhD in three years, though I guess at least that means that between us, we have a re- spectable average time to completion. Much love also to "the mob", i.e. my extended family in Singapore: Der Yao, Mema, UTH, Eleanor, Veronica, Kelvin, AWW, AMM, Uncle James, Aunty Siew Lan, Gabriel and Christine; as well as my in-laws, Candy, Dave, Amy, and all the Padowski aunts, uncles, and cousins. Lastly, we come to my boys: Greg and our purr machines Salemtand Mor- ris, who have been here for me through my best days and my worst days, and made all them brighter just by being there. My love for you is monotonically non- decreasing. This material is based upon work supported by the National Science Founda- tion Graduate Research Fellowship Program under Grant No. 1122374. 7 8 Contents 1 Introduction 13 1.1 The problem of non-concatenative morphological segmentation ... 14 1.1.1 What about meaning? .... ............... .. 17 1.2 O u tlin e ............ .................... .. 18 1.3 Models of Arabic morphology .... ............... .. 21 1.3.1 Root-based accounts .......... .......... .. 22 1.3.2 Non-root-based accounts ................ .. 23 1.3.3 Evidence for the Semitic root ............... .. 26 1.3.4 Evidence for other morphemes in Semitic . ....... 26 1.3.5 Our model ............ .............. .. 27 2 Computational Modelling 31 2.1 Bayesian morphological segmentation ...... .... .... 31 2.1.1 The Goldwater-Griffiths-Johnson model .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages140 Page
-
File Size-