Distributed Representations and the Uniform Information Density Hypothesis
Total Page:16
File Type:pdf, Size:1020Kb
Universität des Saarlandes Connectionist Language Production: Distributed Representations and the Uniform Information Density Hypothesis Dissertation zur Erlangung des akademischen Grades eines Doktors der Philosophie der Philosophischen Fakultät der Universität des Saarlandes vorgelegt von Jesús Calvillo aus Morelia, Mexiko Saarbrücken, 2019 Dekan der Fakultät: Prof. Dr. Heinrich Schlange-Schöningen Erstberichterstatter: Prof. Dr. Matthew W Crocker Zweitberichterstatter: Prof. Dr. Stefan Frank Tag der letzten Prüfungsleistung: 20/05/2019 Acknowledgements This PhD was a very interesting experience in which I learned plenty, both as a researcher and as a person. During this time I was not alone, I was blessed with many people, who helped me and taught me, and to whom I will always be grateful. Among those people, I would like to express my gratitude first and foremost to my advisor Matthew W Crocker. He guided me through this PhD while giving me the freedom to explore the ideas that I found most interesting. From him I learned about psycholinguistics, how science works, how to structure and pursue research, among many other lessons that will help me in my life. He is an excellent person and researcher, and I feel very honoured to have him as advisor. I would also like to acknowledge the help that I received from Harm Brouwer. He provided the code to generate the situation state space that was used throughout this dissertation, as well as comments that helped during the shaping of this work. I am grateful for that. I am also very grateful to my research group, who was always eager to help and give advice. I was very happy to work with you and I am looking forward to future collabo- rations and/or drinks. Not everything is work, and work cannot be everything. There were people who sup- ported me personally in this stage of my life. From them I took strength, energy and comfort that allowed me to pursue my goals. For that, I would like to thank my friends who spent some of their time with me throughout these years. I learned so much from you, and I treasure so many memories that I will always be in debt to you. My family has always supported me in my projects, even when I am so far away. I am very happy and grateful to know that I can always rely on you. I am also grateful to Saarbrücken, which is a wonderful crazy city with amazing people, in which I always feel welcomed. It will always be a place close to my heart. I would also like to thank the Mexican National Council of Science and Technology (CONACYT) for the support that allowed me to pursue this degree. Finally, I dedicate this work to my father Antonio Calvillo Paz, who taught me that for every problem there is a solution, you just have to keep looking and never give up. ii Connectionist Language Production: Distributed Representations and the Uniform Information Density Hypothesis Abstract This dissertation approaches the task of modeling human sentence production from a connectionist point of view, and using distributed semantic representations. The main questions it tries to address are: (i) whether the distributed semantic representations defined by Frank et al. (2009) are suitable to model sentence production using artificial neural networks, (ii) the behavior and internal mechanism of a model that uses this representations and recurrent neural networks, and (iii) a mechanistic account of the Uniform Information Density Hypothesis (UID; Jaeger, 2006; Levy and Jaeger, 2007). Regarding the first point, the semantic representations of Frank et al. (2009), called sit- uation vectors are points in a vector space where each vector contains information about the observations in which an event and a corresponding sentence are true. These rep- resentations have been successfully used to model language comprehension (e.g., Frank et al., 2009; Venhuizen et al., 2018). During the construction of these vectors, however, a dimensionality reduction process introduces some loss of information, which causes some aspects to be no longer recognizable, reducing the performance of a model that utilizes them. In order to address this issue, belief vectors are introduced, which could be regarded as an alternative way to obtain semantic representations of manageable dimen- sionality. These two types of representations (situation and belief vectors) are evaluated using them as input for a sentence production model that implements an extension of a Simple Recurrent Neural network (Elman, 1990). This model was tested under different conditions corresponding to different levels of systematicity, which is the ability of a model to generalize from a set of known items to a set of novel ones. Systematicity is an essential attribute that a model of sentence processing has to possess, considering that the number of sentences that can be generated for a given language is infinite, and therefore it is not feasible to memorize all possible message-sentence pairs. The results showed that the model was able to generalize with a very high performance in all test conditions, demonstrating a systematic behavior. Furthermore, the errors that it elicited were related to very similar semantic representations, reflecting the speech error liter- ature, which states that speech errors involve elements with semantic or phonological similarity. This result further demonstrates the systematic behavior of the model, as it iii processes similar semantic representations in a similar way, even if they are new to the model. Regarding the second point, the sentence production model was analyzed in two dif- ferent ways. First, by looking at the sentences it produces, including the errors elicited, highlighting difficulties and preferences of the model. The results revealed that themodel learns the syntactic patterns of the language, reflecting its statistical nature, and that its main difficulty is related to very similar semantic representations, sometimes producing unintended sentences that are however very semantically related to the intended ones. Second, the connection weights and activation patterns of the model were also analyzed, reaching an algorithmic account of the internal processing of the model. According to this, the input semantic representation activates the words that are related to its content, giving an idea of their order by providing relatively more activation to words that are likely to appear early in the sentence. Then, at each time step the word that was previously produced activates syntactic and semantic constraints on the next word productions, while the context units of the recurrence preserve information through time, allowing the model to enforce long distance dependencies. We propose that these results can inform about the internal processing of models with similar architecture. Regarding the third point, an extension of the model is proposed with the goal of mod- eling UID. According to UID, language production is an efficient process affected bya tendency to produce linguistic units distributing the information as uniformly as possible and close to the capacity of the communication channel, given the encoding possibilities of the language, thus optimizing the amount of information that is transmitted per time unit. This extension of the model approaches UID by balancing two different production strategies: one where the model produces the word with highest probability given the se- mantics and the previously produced words, and another one where the model produces the word that would minimize the sentence length given the semantic representation and the previously produced words. By combining these two strategies, the model was able to produce sentences with different levels of information density and uniformity, providing a first step to model UID at the algorithmic level of analysis. In sum, the results show that the distributed semantic representations of Frank et al. (2009) can be used to model sentence production, exhibiting systematicity. Moreover, an algorithmic account of the internal behavior of the model was reached, with the potential to generalize to other models with similar architecture. Finally, a model of UID is presented, highlighting some important aspects about UID that need to be addressed in order to go from the formulation of UID at the computational level of analysis to a mechanistic account at the algorithmic level. iv Kurzzusammenfassung Diese Dissertation widmet sich der Aufgabe, die menschliche Satzproduktion aus kon- nektionistischer Sicht zu modellieren und dabei verteilte semantische Repräsentationen zu verwenden. Die Schwerpunkte werden dabei sein: (i) die Frage, ob die von Frank et al. (2009) definierten verteilten semantischen Repräsentationen geeignet sind, umdie Satzproduktion unter Verwendung künstlicher neuronaler Netze zu modellieren; (ii) das Verhalten und der interne Mechanismus eines Modells, das diese Repräsentationen und rekurrente neuronale Netze verwendet; (iii) eine mechanistische Darstellung der Uniform Information Density Hypothesis (UID; Jaeger, 2006; Levy and Jaeger, 2007). Zunächst sei angenommen, dass die Repräsentationen von Frank et al. (2009), genannt Situation Vektoren, Punkte in einem Vektorraum sind, in dem jeder Vektor Informa- tionen über Beobachtungen enthält, in denen ein Ereignis und ein entsprechender Satz wahr sind. Diese Repräsentationen wurden erfolgreich verwendet, um Sprachverständ- nis zu modellieren (z.B. Frank et al., 2009; Venhuizen et al., 2018). Während der Kon- struktion dieser Vektoren führt ein Prozess