Lexical Intuitions and Collocation Patterns in Corpora
Total Page:16
File Type:pdf, Size:1020Kb
Lexical Intuitions and Collocation Patterns in Corpora by Iain David McGee Centre for Language and Communication Research School o f English, Communication and Philosophy Cardiff University Thesis submitted for degree of Ph.D. Cardiff University June 2006 UMI Number: U584816 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. Dissertation Publishing UMI U584816 Published by ProQuest LLC 2013. Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code. ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 DECLARATION This work has not previously been accepted in substance for any degree and is not being concurrently submitted in candidature for any degree. Signed.. Date STATEMENT 1 This thesis is the result of my own investigations, except where otherwise stated. Other sources are acknowledged by footnotes giving explicit references. A bibliography is appended. Signed Date 2 -& . STATEMENT 2 I hereby give consent for my thesis, if accepted, to be available for photocopying and for inter-library loan, and for the title and summary to be made available to outside organizations. Signed Date Thesis Abstract Language teachers are often called upon by their students to provide examples of vocabulary usage in the classroom. Drawing on their experience of language, these teachers model lexical combinations and collocations, not only in their classes, but also in materials writing. However, corpus linguists have claimed that native speaker intuitions about the typical collocates of words are not reliable, because they do not align with the patterns observed in large corpora. These claims are critically evaluated, and an alternative explanation for the mismatch, the possibility that the corpora might not be representative of actual language in use, is also examined. Various linguistic and psycholinguistic explanations for the disparity between corpus data and elicited data are examined, and theories dealing with the mental representation of collocations are also discussed. Data from word frequency estimate research, and word association research are also analysed for relevant information on the subject. Five experiments are then reported, investigating the ability of native speakers (students and EFL teachers) and non native speakers (Arab university teachers) to rank, recognize and spontaneously produce frequent adjective-noun collocations. The results indicate that a key factor affecting the ‘quality’ of lexical intuitions may be the employment of an ‘availability heuristic’ in judgements of frequency. It is argued that some collocates of words may be more hidden from memory searches than others, and that there may be a systematic bias in the respondants’ lexical intuitions based on how words are stored in the mental lexicon. Conclusions are drawn that reflect the many facets of research relevant to the questions under discussion: corpus linguistics, frequency theory, word association research, learning theory and theories of lexical storage. The thesis ends in applying some of the key findings to language teachers. Acknowledgments I would like to thank Prof. Alison Wray for her help and support during the writing of the thesis. She has shown great patience and given much encouragement along the way: she has been a model supervisor. I would also like to thank Gaetanelle Gilquin for alerting me to some relevant research on elicited data and corpus data research. The libraiy staff at King Fahd University of Petroleum and Minerals, Saudi Arabia, have been very helpful in obtaining copies of hard-to-come-by journal articles and books - I am thankful for their help and hard work. Without the respondents giving me their time, this research could not have been conducted and I gratefully acknowledge their willingness to participate in the experiments which are reported in this thesis. Last, but not least, I should like to thank my wife and children for their patience and support, and for giving me the time to put in the many hours that have been spent on this project. This thesis is dedicated to them. Table of Contents Declaration Thesis Abstract Acknowledgments Chapter 1 The Claims of Corpus Linguists against Lexical Intuitions ............ 1. Introduction 2. Background 3. Specific claims 3.1. Collocation 3.2. Frequency 3.3. Semantic prosody and pragmatics 3.4. Phraseology 3.5. The true place for intuitions 4. Explanations for data differences 4.1. Connotation, denotation and delexicalisation 4.2. Saliency 5. Summary Chapter 2 Corpus Representativeness: Establishing the Parameters of Usage 1. Introduction 2. Objection 1: Different registers different data 2.1. Word frequencies 2.2. Word meanings 2.3. Collocations 2.4. Spoken and written English 2.5. National Englishes 3. Objection 2: Representativeness 3.1. Size 3.2. Content 3.3. The BNC 4. Objection 3: The Internet as a super-corpus 5. Objection 4: The dating of a corpus 6. Objection 5: Corpus material and significance 7. Objection 6: Different corpora different data 7.1. Comparisons o f ‘traditional’ corpora 7.2. Comparisons of ‘traditional’ corpora and the Internet. 7.2.1. Frequencies of words 7.2.2. Frequencies of multi-word items 8. Summary Page No. Chapter 3 Collocation Classification and Psycholinguistic Representation .................................63 1. Introduction 2. Collocation: definition and classification 2.1. Co-occurrence of words 2.2. Frequency 2.3. Restrictedness 2.4. Selectional restrictions and embedded collocations 3. Categorising adjectives 4. Collocation representation: psycholinguistic perspectives 4.1. Background 4.2. Psycholinguistic explanations for data differences 4.3. Wray’s formulaic language model 4.4. Evidence for the existence of fused language or formulaic language 5. Summary Chapter 4 Word Frequency Estimation and Frequency Estimation Theory ...............................94 1. Introduction 2. Frequency estimation research 2.1. Word frequency 2.1.1. Methodological issues 2.1.2. Different corpora different results? 2.1.3. Priming effects 2.2. Collocation frequency 2.3. Miscellaneous 3. The coding of and access to frequency information 3.1. Coding of frequency information 3.1.1. Indirect and direct coding 3.1.2. The automatic encoding of frequency 3.2. Access to frequency information 4. Summary Chapter 5 Ranking Restricted and Free Collocations ...................................................................116 1. Introduction 2. Statistical measures of collocation strength and raw frequency co-occurrence data 3. Experiment 1 3.1. Rationale for experiment and hypotheses 3.2. Experiment design 3.2.1. Choice of adjectives and collocations 3.2.2. Differences in frequency between test items 3.2.3. Collocation frequency and raw noun frequency 3.2.4. The sets for testing 3.3. Methodology 3.4. Subjects 3.5. Results and analyses 3.6. Discussion: testing the hypotheses 3.6.1. Evidence of embedding effects 3.6.2. Evidence of saliency effects 3.6.3. Miscellaneous explanations 4. Summary Chapter 6 Insights from Word Association Studies ............................................. 1. Introduction 2. Review of word association testing 2.1. Background 2.2. Methodological issues 2.2.1. Controlled association and free association 2.2.2. Single and multiple responses 2.3. Types of stimuli and response classification 2.3.1. Stimulus type and its effect on response type 2.3.2. Response types: paradigmatic, syntagmatic, clang 2.4. Native speaker and non-native speaker responses 2.4.1. Native speakers 2.4.2. Non-native speakers 2.4.3. Developmental issues 3. Adjective-noun collocations in word association tests 3.1 Introduction and research questions 3.2. Analysis of Moss & Older (1996) data 3.2.1. Adjective stimuli and responses 3.2.1.1. Analysing the data 3.2.1.2. Results 3.2.1.3. Discussion 3.2.2. Noun stimuli and responses 3.2.2.1. Analysing the data 3.2.2.2. Results 3.2.2.3. Discussion 4. Summary Page No. Chapter 7 Testing Productive and Receptive Knowledge of Noun Collocates of Frequent Adjectives ..................................................................192 1. Introduction 2. Experiment 2 2.1. Main research question and hypothesis 2.2. Subjects 2.2.1. Group 1 - Native speakers (NSs) 2.2.2. Group 2 - Non-native speakers (NNSs) 2.3. Designing the experiment 2.3.1. Justification for methodology adopted 2.3.2. Criteria affecting the choice of word stimuli 2.3.3. The issue of lemmatisation 2.3.4. The adjective stimuli 2.4. Method 2.5. Quantitative analyses - justification for choice of test 2.6. Results 2.6.1. Native speakers 2.6.1.1. Analysis 1 - Excluding ties 2.6.1.2. Analysis 2 - Retaining ties 2.6.2. Non-native speakers 2.6.2.1. Analysis 1 - Excluding ties 2.6.2.2. Analysis 2 - Retaining ties 2.6.3. Comparison of native speaker and non-native speaker responses 2.6.3.1. Quantitative analyses 2.6.3.1.1. Number of different responses 2.6.3.1.2. Comparison of NS and NNS group ranks 2.6.3.2. Qualitative analyses 2.6.4. The dominant responses 2.7. Discussion 3. Experiment 3 3.1. Main research interest and hypothesis 3.2. Designing the experiment 3.3. Method 3.4. Subjects 3.5. Results 3.5.1. Native speakers 3.5.1.1. Analysis 1 (Majority choice) 3.5.1.2. Analysis 2 (Votes) 3.5.2. Non-native speakers 3.5.2.1. Analysis 1 (Majority choice) 3.5.2.2. Analysis 2 (Votes) Page No. 3.6. Discussion 4. Summary Chapter 8 Testing Productive and Receptive Knowledge of Adjective Collocates of Frequent Nouns ..................................................................252 1. Introduction 2. Experiment 4 2.1. Rationale for experiment 2.2. Research questions 2.3.