Elements of a written interlanguage:
A computational and corpus-based study of
institutional influences on the acquisition of English by Hong Kong Chinese students
John Milton
RESEARCH REPORTS
General Editor: Gregory James
VOLUME TWO
Elements of a written interlanguage: A computational and corpus-based study of institutional influences on the acquisition of English by Hong Kong Chinese students
John Milton
LANGUAGE CENTRE The Hong Kong University of Science and Technology
This report is a shortened, edited version of the author’s thesis, ‘The description of a written interlanguage: Institutional influences on the acquisition of English by Hong Kong Chinese students (a computational and corpus-based methodology)’, for which he was awarded the degree of PhD at Lancaster University, 2000.
Language Centre The Hong Kong University of Science and Technology Copyright © August 2001. All rights reserved. ISBN 962-7607-15-0
Postal Address: Language Centre, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong SAR, CHINA
Telephone: (852) 2358 7880
Facsimile: (852) 2335 0249
Dedication
To Warqa and Kay
Contents
Editorial Foreword ix
Acknowledgements xi
Summary xiii
Preliminary notes xv
Corpora used in this study xv Interlanguage corpora xv ‘Control’, ‘Target language’ and Standard English corpora xvi Qualifications xviii
Chapter One: Distributional features of HK interlanguage 1
Introduction 1 Rationale for the data used 2 Word-class distribution in HKIL 3 Variation among ILs, registers and acquisitional sequences 10 Similarities between HKIL and SE conversation 12 Rhetorical questions 13 Repetitiveness 14 Co-ordination and subordination 15 Dissimilarities between HKIL and SE conversation 17 Plural nouns and determiners 18 Orders of acquisition and difficulty 24 Predicted and observed orders of acquisition and difficulty 26 -ing participles 28 The marking of possession 31 Verb morphology 32 Negation 32 Summary: A natural order of acquisition or an institutionalised IL? 36 Variations between English NS students’ texts and professional texts 37 Linguistic features of input 40
Chapter Two: The grammar of HK interlanguage 43
Overt ‘local’ errors in HKIL 43 Lexical bundles and templates 45 Details of word-class error frequency and distribution 46 Noun number, articles and S-V concord 49 Noun number 49 The articles 53 Ø for the 54 the for Ø 55 Ø for a 55 the for a 56
vii S-V discord 56 Variant patterns of subordination 58 That-complement clauses 58 -ing participles and infinitives 60 Ungrammatical use of pronouns and subordination 61 Information structure and subordination 62 Prepositions 64 Distributional factors 64 Overuse 65 Underuse 67 Verb arguments 68 Verb choice 73 Omitted copulas 73 Auxiliary BE 74 The existential in HKIL 75 Summary 77
Chapter Three: Doubt and certainty in HK interlanguage 79
The concept of ‘hedging’ 79 EFL students’ difficulties in hedging 80 Adverbial hedges 82 Intensifiers 83 Syntactic roles of adverbs 85 Adverbs of time and place 86 The imposition of coherence and certainty through adverbial connectors 87 The expression of epistemic modality by that-complementation 92 Degrees of depersonalisation and impersonalisation 98 The expression of epistemic modality by modal verbs 99 Variations in the expression of doubt and certainty among L2 students 104 Epistemic clusters 107
Chapter Four: Conclusion 109
Summary 109 Future directions 110
Bibliography 111
Appendix 1: 1994 UE, A grade sample examination script 119
Appendix 2: 1994 UE, D grade sample examination script 121
Appendix 3: 1994 GS, A grade sample examination script 123
Appendix 4: Sample Taiwanese learner’s text 125
viii Editorial Foreword
Hong Kong’s local education system is not producing students with adequate English language pro- ficiency, charges one of the city’s top business leaders, David Eldon. Standards of English in Hong Kong are falling behind those of neighbouring cities, such as Beijing and Shanghai, according to Eldon, who is chairman of the Hongkong and Shanghai Banking Corporation, one of Hong Kong’s largest employers. Expatriate executives from Hong Kong often make similar charges. The down- ward slide reportedly began before Britain returned control of the territory to China in 1997. Eldon has asked the government to move faster to provide more visas to English-language teachers from countries such as Australia and the UK. (The Financial Times, 14th December, 2000)
Since I was appointed as [Education] commission chairwoman last month, the declining proficiency in English and Chinese has been the primary area of concern shared by the people I have met … The writing and oral skills in both languages of the new generation has [sic] generally declined. We have to find the reasons. (Rosanna Wong Yick-ming, as reported by Gary Cheung in the South China Morning Post, 3rd May, 2001)
Ever since Professor Roy Harris’ (1989) controversial inaugural lecture at the University of Hong Kong, in which he characterised Hong Kong English as “the worst English in the world”, the theme of ‘declining standards’ has been a mantra in local society (cf. Moody 1997). For long, there was only disputed anecdotal evidence on which to base judgements, and demographic and sociolin- guistic arguments were used to shore up defences. In recent years, however, public examination results have tended to lend support to the popular contention that English language ‘standards’ have indeed ‘dropped’.
What ‘standards’ are being referred to is not always altogether clear, however, but there is certainly a widespread dissatisfaction with many students’ inability to manipulate adequately the mechanics of the language. Hong Kong students’ English has long engaged the attention of language professionals (cf. Yung 1958; Board 1969; Shak 1971; Budge 1986; Chan 1987; Ho 1988; Bunton 1992; Field & Oi Yip 1992; Field 1994; Tang & Ng 1995; Chui 1996), but many investigations have tended to be intuitive, or based on restricted sources.
John Milton’s timely report suggests some of the reasons for the continued existence of certain idiosyncratic features of Hong Kong students’ written English, often characterised as ‘ungram- matical’. He bases his extensive analysis on fresh evidence, gleaned from a substantial corpus of scripts of Hong Kong matriculation examinations (the Hong Kong Examinations Authority’s Use of English Examination), compared with public examination scripts of students of a similar age to the Hong Kong examinees (the University of Cambridge ‘A’ level General Paper). He not only shows that the English interlanguage of Hong Kong students is homogeneous, but also, for the first time, offers analyses, based on frequency counts, to reveal the degree to which this interlanguage diverges from a native standard. By comparing the data from the two populations, he demonstrates the extent to which Hong Kong students overuse, underuse or misuse certain English words and expressions, in comparison to their native-speaking peers. He is thus able to offer a much more precise characterisation of Hong Kong students’ English than has hitherto ever been made.
Milton does not confine himself to a description of the use of isolated words and expressions, but expands his enquiry to include aspects of some of the typical discoursal features evinced in the data, such as patterns of subordination and the expression of epistemic modality. He claims that these and other aspects of Hong Kong students’ interlanguage are systematic, but shows that
ix second-language acquisition theories “have not proven very dependable in predicting or accounting for these observed features in HK learners’ written production”. The general characteristics he high-lights are of a local, often stigmatised, variety of English that is perpetuating itself through insti-tutional reinforcement, but he notes that this variety, distinguished by “conservative production strategies”, is “accommodated remarkably well to the demands and constraints of [the students’] educational environment”. He suggests, however, that there is a clear need for teachers and students to become aware of the differences between the types of disparities between Hong Kong inter-language and Standard English. More adequate descriptions of these differences than have yet been available are needed, to inform curricula, textbook design and classroom pedagogy.
References
Board, M.-W. 1969. An analysis of Chinese learners’ difficulties in writing English. PhD, University of Hong Kong. Budge, C. 1986. Variation in Hong Kong English. PhD, Monash University. Bunton, D. 1992. Thematisation and given–new information: Their effect on coherence in Hong Kong secondary student writing. MEd, University of Hong Kong. Chan, B. K-H. 1987. Some problems in the written English of lower-sixth form students in Hong Kong. MA, University of Hong Kong. Cheung, G. 2001. Rosanna Wong says English is key issue. South China Morning Post, 3.5.2001, p. 4. Chui, H. M. 1996. The criteria employed in writing and judging the quality of written texts: A case study of Hong Kong tertiary students. MA, University of Surrey. English fluency lags in Hong Kong. The Financial Times, 14.12.2000. [Online.] Available at www.ft.com. Field, Y. 1994. Cohesive conjunctions in the English writing of Cantonese speaking students from Hong Kong. Australian Review of Applied Linguistics 17, 1, 125–39. Field, Y. & Oi Yip, L. M. 1992. A comparison of internal conjunctive cohesion in the English essay writing of Cantonese speakers and native speakers of English. RELC Journal 23, 1, 15–28. Harris, R. 1989. The worst English in the world? Inaugural lecture from the Chair of English Language, 24th April. Supplement to The Gazette 36, 1, 37–46. Hong Kong: University of Hong Kong. Ho, Y. Y. 1988. A study of the quality of writing of Hong Kong secondary students. BEd, University of Nottingham. Moody, A. J. 1997. The status of language change in Hong Kong English. PhD, University of Kansas. Shak, W.-H. 1971. A study of errors in the written English of learners in Anglo-Chinese secondary schools in Hong Kong. MA, University of Hong Kong. Tang, E. & Ng, C. 1995. A study on the use of connectives in ESL students’ writing. Perspectives 7, 2, 105–22. Yung, T. T.-Y. 1958. An analysis of the written English of Chinese pupils in Hong Kong. MA, University of London.
x Acknowledgements
I am indebted to the students whose production is the subject of this study. The data collection for this analysis would also not have been possible without a series of grants from the Hong Kong University Grants Committee (DAG92/93.LC01; HKUST 514/94H; DAG94/95.LC01). These grants, in turn, enabled me to work with a number of individuals who aided in the preparation of materials for analysis: Warqa Milton, who did most of the transcription of the examination scripts into electronic format; Nandini Chowdhury, who assisted with the manual error tagging; and Robert Freeman, who assisted with the early management of the corpora and helped design UNIX scripts for their analysis.
Several colleagues at the Hong Kong University of Science and Technology (UST), especially Professor Gregory James, have taken an interest in various aspects of this work. Cantonese- speaking colleagues, notably Ms. Candice Poon, have also helped me in my study of Cantonese syntax. I am very grateful to my PhD supervisor, Professor Geoffrey Leech, for his patience and advice. Any shortcomings are of course my own.
John Milton
xi xii Summary
This study sets out to identify the main variant features of the written interlanguage (IL) of Hong Kong students of English (HKIL) and to determine sociological and linguistic factors that might help account for the persistence of these features. It describes computational and manual analyses of an electronic corpus of HK students’ texts mainly in comparison to a ‘control’ corpus of UK stu- ents’ Standard English (SE) texts.
Three aspects of HKIL are investigated, based on data revealed by part-of-speech (POS) tagging:
1. an identification of the distributional profile of the IL (based on POS categories) – i.e. many of the lexicogrammatical features which can be shown to be characteristic of this inter- language; 1. those features of SE (determined by POS tagging) that appear to present the greatest ‘learnability’ and production problems for HK learners; and 2. characteristic discoursal features of HKIL (particularly epistemic modality) identified by word class.
The findings of these empirical analyses question second language acquisition (SLA) theories that make strong generalised claims for linguistic constraints on L2 acquisition; for example, that there is a ‘universal and natural order of acquisition’ independent of L1 and instruction.
The IL data suggest instead that these learners are, to a substantial degree, encouraged in the application of compensatory production strategies, often at the expense of acquiring grammatical and communicative competence. These compensatory strategies appear to be one factor hampering the learners’ effective communication of representative and propositional information in English. Moreover, several characteristic interlanguage features appear to be institutionally induced, partly because HK students are misinformed about the properties of, and distinctions between, spoken and written English. The linguistic contexts of ungrammaticality and reduced expression in the L2, including the constrained manner in which these learners are taught to structure information, make clear the need for pedagogy to go beyond error correction in helping learners articulate and reformulate their L2 texts.
xiii xiv
Preliminary notes
Corpora used in this study
Interlanguage corpora
Between 1992 and 2000, I collected the writing of Hong Kong (HK) students submitted in electronic form for courses in English as a foreign language (EFL) at the Hong Kong University of Science and Technology (UST). The resulting ‘monitor archive’ 1 is composed of assignments written by students during their three-year undergraduate programme. Since 1997 students have been required to submit electronic copies in text format via e-mail directly to a server, where the files are stored by student ID number. As of January 2001, this archive consisted of about 25 million running words of texts from about 6,000 students (about 40,000 scripts).
The size of such a learner corpus is significant in that, as the collection grows, the number of topics increases, and the influence of the initially relatively limited number of rubrics, topics and tasks on the grammatical patterns and lexis of the texts lessens. As Selinker (1992: 213) observes “… topic, a semantic/discourse variable, can affect surface syntactic order in both the NL [native language] and IL [interlanguage], and thus in the language transfer process”. My main use of this archive was to check the observations found in the approximately 1.5-million-word HK examination corpus described here.
In addition to the archive described above, I have collected, and have had transcribed into electronic format, a number of other texts written by HK and other Chinese-speaking students.
The Hong Kong Examinations Authority (HKEA) gave me access to the 1992 scripts of the written section of the A-level Use of English Examination (hereafter UE92)2, which is administered to all HK Form 7 school-leavers. I compiled 550 scripts awarded a ‘D’ grade, and another 550 scripts assigned an ‘E’ grade: altogether 1,100 papers that received minimally passing grades – a total of about 600,000 tokens. This examination corpus was transcribed and tagged with the CLAWS part- of-speech tagger (the CL-7 tagset), see Garside et al. (1987).3 About 50,000 words from each grade range (‘E’ and ‘D’) were manually post-edited for POS tagging accuracy, and then manually coded for error.
A second trawl of examination scripts in 1994 (UE94) resulted in the compilation of 1,400 tran- scribed scripts representing all seven grade levels of the A-level Use of English Examination scripts. This collection contains 200 scripts randomly and evenly selected from each grade level (i.e. ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’ and ‘U’, or ‘unclassifiable’): a total of 750,000 tokens. This collection was tagged using the CL-7 tagset. This examination corpus makes possible the investigation of lexis and grammar at various proficiency levels of HK students’ production.4 My immediate reason for undertaking this collection was that by 1994, many local universities were requiring all incoming students to take a first-year EFL course. This 1994 collection allows the
1 I will refer to the overall collection of the student assignments as an ‘archive’ in the more general sense of the word, and to any specific collection of texts used for analysis as a ‘corpus’. I realise that this stretches the definition of the words, but I believe it is justified since, although much of the archive of HK students’ texts is an opportunistic collection, it and the other collections I hold were not arrived at indiscriminately. 2 These examinations have been rigorously standardised: an ‘E’ grade is roughly equivalent to a TOEFL score of 450, and an ‘A’ to 600 (Hogan & Chan 1993: 6). 3 The tagset is listed at http://www.comp.lancs.ac.uk/computing/research/ucrel/claws/. 4 That is, proficiency as measured by grades assigned on standardised and norm-referenced examinations.
xv
investigation of ‘orders of difficulty/acquisition’ discussed in this study, as it represents the full range of written language performance among all HK secondary-school leavers. However, manual analyses in this study are conducted on the less proficient HK student scripts (graded ‘D’ and ‘E’) from the 1992 and 1994 collections, which represent the production of over 50% of HK school- leavers.
In co-operation with colleagues teaching in Taipei,5 I also acquired assignments written in under- graduate EFL courses at several Taiwan universities, collected and transcribed over a thirteen-year period between 1985 and 1998. This collection consists of about 500,000 words of texts written by some 1,000 students. Although there are a number of differences between this and the HK learner corpora described above, the two IL corpora provide an opportunity to compare the written English of two cohorts of Chinese speakers who have essentially the same first-language (L1) written system, and very similar L1 grammars. All the HK students are native speakers of Cantonese and have had at least 15 years of formal schooling in mixed code (English and Cantonese) beginning from a very early age. Taiwanese students, on the other hand, are educated in Putonghua (although at home they may speak one or more dialects – see Ramsey 1987: 107–15) and have had much less, and much later, instruction in English. While the HK students’ texts are graded by proficiency, the Taiwanese students’ texts are not, although they clearly represent a range of proficiency levels. The comparison of these two student cohorts is not central to this study, but any differences we discover will be interesting, especially for the light they might shed on L1 transfer theories. No studies that I know of have compared English texts written by Cantonese and Putonghua speakers – or even distinguished between the interlanguage dialects.
Another collection of texts written in English by second-language (L2) writers, and to which I will occasionally compare the HK students’ texts, is the Longman Learners’ Corpus (generously made available by Pearson Education, UK). The version I have consists of about 3,500,000 words, including some 32,000 words from ‘Chinese speakers’.
These collections of IL data allow the written English texts of HK students at various proficiency levels to be compared with each other, as well as to the academic writing of other L2 student writers of English (especially undergraduate students in Taiwan), and with the native-speaker (NS) text corpora described below.
‘Control’, ‘Target language’ and Standard English corpora
I distinguish in this study between Standard English (SE) and the target language (TL) that HK learners are generally encouraged to emulate at school. HK learners are seldom exposed, or have access, to authentic registers of SE in primary and secondary school. The ostensible TL that they are most often exposed to in EFL classrooms varies in its approximation to SE, from a gram- matically accurate but greatly simplified register of ‘teacher talk’, to a variety that is often closer to the grammar of the interlanguage dialect than to SE.
The standard corpora of professionally edited and expertly written native speakers’ texts – e.g. the Brown and London-Oslo-Bergen (LOB) corpora, and even the more recent British National Corpus (BNC) – do not provide appropriate controls for a corpus of the writing of undergraduate language learners. Toward this end, I obtained (thanks to the University of Cambridge Local Examinations
5 Colman Bernath (Soochow University) and Ting-Kun Eric Liu (University of Newcastle, UK). See Appendix 4 for a sample text.
xvi
Syndicate), and had transcribed into electronic format, 770 school-leaving examination essay scripts produced by 110 students in the UK who had received grades of ‘A’ and ‘B’ on the 1994 A- level General Studies Examination. This ‘GS’ corpus (of about 510,000 tokens) represents relatively proficient academic writing by English NS (or near NS) students, whose ages and academic background are similar to those of the HK students. These UK students’ texts constitute an SE corpus comparable in genre to the HK students’ texts, as both corpora contain argumentative/ discursive essays written under examination conditions. I treat the UK students’ texts as a control group, without suggesting that these texts are, or should be, a target for EFL learners, since even the most proficient NS students are still novice writers.6 The topic, sex of author and assigned grade of each script in all three examination corpora are identifiable by file name.
In addition to this ‘control corpus’, written by NS counterparts of the HK learners, at least two types of TL corpora are useful for this type of study. One is the written English to which the learners are most likely to have been exposed. I have acquired four such collections of types of texts that HK students are likely to have had presented to them as models: