Grammaticality Judgements, Intuitions and Corpora 175-015 Syntax
Total Page:16
File Type:pdf, Size:1020Kb
Grammaticality judgements, intuitions and corpora 175-015 Syntax Catherine Lai June 2004 1 Introduction The syntactician’s toolkit relies heavily on grammaticality judgements. Tests determining the structure of language often involve transforming a language fragment and then checking the gram- maticality of the result. For example, distributional tests for constituency. However, determining grammaticality is not always a straightforward task. Tests for grammaticality have been characterized by two forms of evidence used. Native speaker intuitions have been the preferred evidence in the past fifty years of syntactic inquiry. On the other hand, frequency data obtained from corpora have developed an increased support base in recent times. In this essay, I review native speaker intuitions and corpus derived frequencies and their appropriateness as data for grammaticality testing. Finally considering how they may be used together to form a stronger empirical base of evidence for grammaticality. 1.1 Competence and Performance The divide between these two sets of evidence has been driven by the theory that grammar is autonomous within the human mind. The theory was propelled into the mainstream by Chomsky 1 in the 1950’s (Chomksy, 1957). The corollary of this is that that our use of language (performance) does not accurately reflect our internal knowledge of language (competence). Grammar is assumed 1Note, this theory did not originate with Chomsky - see (Derwing, 1979) 1 to be part of competence. Corpus derived data reflects performance and so is not relevant to ques- tions of grammaticality. On the other hand, native speaker intuitions, unpolluted by performance factors, could be. Not everyone accepts this as the truth. Corpus linguistic work with respect to syntax continued through the peak of transformational generative grammar (McEnery & Wilson, 1996). Labov (1975) has argued the case for empiricism. Recently Abney (1996), has argued eloquently that the competence performance distinction is really just a ‘idealization of language for the sake of simplicity’. Similar thoughts have expressed in the past, for example (Lakoff, 1974). Despite this, mainstream linguistics has continued to accept intuitions as the primary data source for grammar and reject of corpus based evidence (Schutze, 1996, pg. 35). It is with this in mind that, we must consider the use of native speaker intuition. 2 Native speaker Intuitions about Grammaticality Using intuitions to test grammaticality is very simple on the surface. An informant’s intuition defines whether a sentence is grammatical. The linguist presents the sentence under question and asks the appropriate question. This is consistent with a generative grammar that can take a string of words and output its grammaticality status (e.g. true or false). Since the real grammar of a language exists inside the native speaker these intuitions should accurately reflect competence. It is generally agreed that intuitions have enabled linguistics research to reach areas outside the scope of a purely corpus driven approach (Labov, 1975; Newmeyer, 2003; Sampson, 1975). Moreover, the linguist can focus on relevant material with great ease and speed. In theory it allows access to data from an infinite sized internal corpora. Also, questioning can provide negative information about grammar absent from corpora. However, the decisive reason to use intuitions is the link to competence. The strength of this link needs to be examined. 2.1 Sentence Judgements and Competence The ellicitation of intuitions is done via judgements of sentences that are clearly affected by per- formance issues. Some sentences are clearly unacceptable while still grammatical. This can be caused by semantics, processing limitations, context and many other traditional performance fac- tors. In these cases acceptability judgements are performance data. The line between what is 2 ungrammatical and what is merely unacceptable is extremely blurred. Reactions to this problem generally call for filtering of affected data. Bever (see (Schutze, 1996, pg.31)) suggests ungrammaticality is only found when unacceptability cannot be explained by performance. According to Chomksy (1965) people are incapable of make judgements about grammaticality since they have no direct access to grammatical knowledge. However, intuition and tests might shed light on the situation anyway. In any case, using sentence judgements to study competence means performance factors need to be stripped away. This is not easy, especially since performance is not precisely defined. The linguist might appear highly qualified to provide judgements that take performance into account. However, allowing linguists to create the data used to tests their hypotheses can result in more distortion than remedy. 2.2 Data Distortions Native speaker intuitions are susceptible to bias. The source of intuitions has often been the linguist seeking to validate a theory. However, it has been shown that linguists have different ideas about language to non-linguists (Labov, 1975, pp.14-18). In fact Newmeyer (1983) claims that for the reasons above only linguists should produce intuitions. However, this ignores the fact that it can be easy to ignore counter evidence (Sampson, 1975; Manning, 2003). As Schutze (1996, pp.49- 50) wisely quips: ‘Except in those cases where they fail to suit the linguist’s purpose, subjects’ intuitions are taken to reflect their true linguistic knowledge.’ When informants are non-linguists, ellicitation of intuitions needs to be carefully controlled to make data reliable and free of external factors (Botha, 1981, pg.304). Cowart (1997) notes that linguistic background, knowledge of the experimenters intentions, over exposure to a sentence type, length of sentences can all change the way informants behave. A myriad of methods have been suggested to try to limit the influence of the experimenter, for example, asking the informant to transformations sentences and see if the part under question is changed (if it is grammatical is should not) (Schutze, 1996, pg.57). Indeed Cowart (1997) shows that when experiments are designed to control variance, sentence judgements are reasonably reliable. Unfortunately, it does not appear that such strict methodologies are mainstream practice. Evidence has been presented in Labov (1975) that informants may judge sentences as ungram- 3 matical even if they frequently use them in everyday life. A corollary of this has been that a persons intuitions of grammaticality may differ from their actual internal grammar. The justification for using intuitions is highly dependent on its unique ties to internal grammar. If Labov’s evidence is accepted then it possible that intuitions are possibly no closer to competence than corpus data. 2.3 Conflict Resolution Resolving disputes over grammaticality is difficult when the deciding factor is an intuition. The generativist motto has been to avoid this altogether and deal only with clear cases when developing theories (Chomksy, 1957). Newmeyer (1983) claimed that most reported data disputes were actually been application of a grammar to an unclear case. However, Schutze (1996) presents cases where ignoring inter-speaker variation has indeed led to data disputes. Others, such as Abney (1996) have claimed that you can make almost anything grammatical if you try, so clear cases are few and far between. To confuse matters more, evidence suggests that grammaticality is a continuous scale (Manning, 2003). Chomksy (1965, pp.10-11) concedes that ‘ grammaticalness is, no doubt, a matter of degree’. Breaking some rules is a worse than breaking others. However, a scale for intuitions has not been agreed on which it makes it virtually impossible to compared results from different studies. The key question is whether claims of grammaticality based on intuitions can be contradicted by other intuitions. Unless one set can be shown to be irrelevant or more contaminated by performance, the answer appears to be no. The danger is that there is a temptation to attribute any conflict to performance. The alternative is a stalemate that persists until the theory changes. This does not appear to be a satisfactory situation. Given this state of affairs it seems well worth considering the usefulness of a corpus approach. 3 Frequency data as evidence of grammaticality Testing grammaticality involves extracting frequencies of the sentence in question. Theoretically, higher frequencies mean a higher probability of grammaticality, vice-versa for low frequencies. The situation where a sentence has not occurred in the corpus is discussed in section 3.2 The resulting distribution can be easily conditionalized on context and meaning (Manning, 2003). More complicated methods (for example distributions over parts of speech) can also be used to determine 4 grammaticality probabilistically. The main advantage of this methodology is that the data is public and verifiable. Tests are repeatable, less dependent on the linguist, and can undergo greater scrutiny. This means that different approaches are easier to compare. The nature of corpora have changed since the objections of the 1950’s. Machine readable corpora are becoming larger and tools are being developed to make searching easier than ever. This increased usability has been one of the driving forces behind the resurge in corpus based linguistic methodologies (McEnery & Wilson, 1996). 3.1 Frequencies of Performance Accepting that frequency data reveals grammaticality requires a paradigm shift. While they contain some