Learnability and Word Order Consistency

Two wrongs make a right: Learnability and word order consistency Padraic Monaghan ([email protected]) Markus Gonitzke ([email protected]) Nick Chater ([email protected]) Department of Psychology, University of Warwick Coventry, CV4 7AL, UK Abstract inconsistencies in the language are learnable. As a test case, we focus on word order in German and English. German Languages often demonstrate word order inconsistencies, and and English are particularly interesting for comparison as such inconsistencies ought to make languages harder to they have the same root (Cable & Bough, 1993), however, acquire. We present an integrative approach exploring the they differ in several important respects: In main clauses, relationship between learnability and word order, incorporating syntactic theory, corpus analyses and German has a subject-verb-object (SVO) word order, computational modelling. We focus on comparisons between whereas in subordinate clauses, the order is subject-object- English and German, and conclude that inconsistencies may verb (SOV), as shown in Sentence 1. The sentences are be preserved in the language due to the interaction between subscripted with subject (S), object (O), finite verb (Vf), several syntactic structures. infinite verb (Vi), and complementiser (C) to indicate the structures. In English, in contrast, word order is SVO in Introduction both main and subordinate clauses (Sentence 2, translation Inconsistent structures in language are harder to learn than of Sentence 1). German and English also differ in terms of consistent structures by computational systems, whether verb position in infinite verb phrases. In German, the inconsistencies are at the syntactic level (Christiansen & infinite verb is sentence final after the object (Sentence 3), Devlin, 1997), or at the lexical level, in terms of grapheme whereas in English, the infinite verb occurs after the finite to phoneme correspondences (Plaut, et al., 1996), or verb, and before the object (Sentence 4, translation of semantic ambiguities (Cottrell, 1986). Several languages are Sentence 3). In subordinate clauses, this is complicated entirely consistent in terms of head position (e.g., the further in German: the finite verb moves to after the infinite position of the verb in verb phrases), such as Japanese or verb after the object (Sentence 5), but word order is the Irish. However, a degree of inconsistency is present among same in English (Sentence 6, translation of Sentence 5). most languages (Kroch, 2000), even if there is still a high degree of consistency (Lupyan & Christiansen, 2002; van [1] SIch Vfbenutzte Odas Werkzeug Cdas Sich Odir Vfgab Everbroeck, 1999). One possible contributor to learnability [2] SI Vfused Othe tool Cthat SI Vfgave Oyou is case-marking, which is particularly useful in languages with relatively free word order (Lupyan & Christiansen, [3] SEr Vfhat Oseine Meinung Vigeändert 2002), though this appears to be a necessary rather than a [4] SHe Vfhas Vichanged Ohis opinion sufficient condition (Kiparsky, 1996) for the learnability of such languages. [5] SEr Vfkauft Oden Teppich Cda Ssie Odie alte Vizerstört Viewed in evolutionary terms, languages that are harder Vfhaben to learn are more likely to die out (Christiansen & Devlin, [6] SHe Vfbuys Othe carpet Csince Sthey Vfhave Videstroyed 1997), and given the high rate of change of languages across Othe old one time, it is a significant challenge to explain how word order inconsistencies are learned within languages. Such differences are real-world examples in languages Our approach to this challenge to account for learnability with the same origin that have many similarities in common. of inconsistencies was to bring together syntactic theory The claims we make from our synthesis of corpus analyses with analyses of the frequencies of different structures in and computational modelling generalise to word order real language corpora, and combine these with consistency in general. However, the two languages we computational modelling. Previous simulations of word have selected are of especial interest as an example as order have largely ignored the proportions of different English used to have the SOV structure in subordinate syntactic structures (though with notable exceptions, e.g., clauses, but this has now changed to SVO which is MacDonald & Christiansen, 2002). Through the use of real consistent with main clause word order (Lightfoot, 1991). language corpora in modelling, we can increase the Subordinate clauses constitute only a small proportion of precision of determining the extent to which the processes phrases, and so the different word order in a minority of of sequential learning are engaged in language processing. clauses is puzzling in its persistence – the greater frequency This paper presents a series of corpus analyses and of SVO has not overwhelmed the SOV structure. Indeed, simulations that explore the conditions under which SVO structures are as easily parsed as SOV structures in subordinate clauses in studies on German speakers (Weyerts 816 et al., 2002). We suggest that general sequential learning behaviour, as reflected in simple recurrent networks, contributes towards preserving such inconsistencies in German word order. In this paper, we explore three grammar fragments of German, compared to the corresponding fragment in English. We postulate that, though subordinate structures may be harder to learn in German, the occurrence of verb- final structures in main clause infinite verb phrases results in easier learning of these structures. Finite verbs in final position are rarer than infinite verbs in final position (26% compared to 74%), and the verb-final position of infinite verbs is acquired earlier than the position of the finite verb (Clahsen & Muysken, 1986). This suggests that verb- ordering in German is influenced by the occurrence of both finite and infinite verb phrases. Finally, we make predictions about the scaffolding of relatively infrequent Figure 1. The simple recurrent network architecture we word order inconsistencies through interaction with other, used in the simulations. The model is trained to predict the more frequent structures. We first detail how we combined next word in the sentence, given the current word and the corpus analyses with modelling in our comparisons between context of the previous state of the hidden units. English and German. 20,000 hand-tagged sentences from German newspapers. Corpus data in modelling For English, we employed the British National Corpus MacDonald and Christiansen (2002) illustrated that the (Burnard, 1995), composed of 100 million words of different frequencies of linguistic structures have an impact automatically tagged English (with an estimated error rate on their ease of processing. It is, therefore, extremely useful of 1.7%, with an additional 4.7% of words given ambiguous to have a representation of the relative frequencies of tags). different structures in languages in order to make assertions We derived simplified versions of the corpora by focusing about the ease of acquisition of inconsistencies that may on the NP, PP and VP structures. We omitted all words that occur within the language. Such frequency information is modified, but did not alter the NP, PP and VP sentence not usually employed in modelling syntactic structures, with structure1. Finally, we omitted any sentences that contained models training on corpora generated from randomised ambiguous tags, unclassified words, numerals, alphabetical proportions of grammatical rules. letters, existential there, conjunctions or postpositions. There are two major influences on the ability of simple These simplifications resulted in 8,814 sentences in the recurrent networks to learn sequential orders. The first is NEGRA corpus, and 2,823,034 sentences in the BNC predictability in word order: if a noun is always followed by corpus which were comprised entirely of nouns, finite verbs, a verb, for example, then the verb can be more easily infinite verbs, prepositions, and complementisers2. Despite predicted in the grammar. If word order is inconsistent then the differences in scale and text source, the overall this will result in greater difficulty of learning. If there are proportions of different structures in the English and many branching structures then learning to correctly predict German corpora are approximately similar, as shown below the next item will be difficult, though the model will be able in the more detailed analyses. Given the large number of to learn the transitional probabilities between elements in sentences omitted from the corpora, however, the the sequence. Another influence on learning in simple proportions given ought to be taken as a general guide only. recurrent networks is the impact of centre-embeddings in The corpus data were used to generate sets of sentences structures (Christiansen & Chater, 1999). The number of that were plausible approximations to the proportions of intervening lexical items between the subject noun and the different phrase structures in English and German. The verb will affect the accuracy of verb agreement, as long- models we used were simple recurrent networks (Elman, distance dependencies are more difficult to learn. This is 1990), or SRNs. SRNs are feedforward backpropagation one interpretation of the results of Christiansen and Devlin’s networks with an extra layer of ‘context’ units that record (1997) study of recursive inconsistencies: Learning the previous

Learnability and Word Order Consistency

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support