<<

INFORMATION TO USERS

The most advanced technology has been used to photograph and reproduce this manuscript from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be fi’om any type of computer printer.

The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely afiect reproduction.

In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand corner and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book.

Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6" x 9" black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order.

UMI University Microfilms International A Bell & Howell Information Company 300 Nortfi Zeeb Road. Ann Arbor. Ml 48106-1346 USA 313/761-4700 800/521-0600

Order Number 9111680

A quantitative study of sociolinguistic variation in

Bourgerie, Dana Scott, Ph.D.

The Ohio State University, 1990

UMI 300 N. Zeeb Rd. Ann Aibor, MI 48106

A QUANTITATIVE STUDY OP

SOCIOLINGUISTIC VARIATION IN CANTONESE

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate

School of the Ohio State University

by Dana Scott Bourgerie, B.A., M.Â.

***

The Ohio State University

1990

Dissertation Committee Approved by

James H.Y. Tai

Marjorie K-M. Chan Adviser Timothy Light Department of East Asian Robert M. Sanders Languages and Literatures

Arnold M. Zwicky To the Memory of My Parents

ii ACKNOWLEDGEMENTS

During my work on this dissertation I have benefited from the expertise of a truly diverse group of scholars. In the spirit of a true committee they all provided me with good advice. Some of that advice I have incorporated here, some I have yet to absorb. I express my appreciation to my committee chair. Professor James Tai, who has directed my project from its beginning. I also express my thanks to my other committee members: Professors Marjorie Chan, Timothy Light, Robert Sanders and Arnold Zwicky. A special thanks to Marjorie Chan who, in effect, served as co-advisor. During most of my field work in during 1988-

89, I was supported by a Fulbright Fellowship. The fellowship allowed me to focus exclusively on the often daunting task of collecting and transcribing the data on which this dissertation is based. In addition, I was provided with office space and academic support from the Department of Anthropology at the Chinese . I express my thanks to Dr. Chao Chien, the department chair, for arranging for my stay there as a visiting researcher. I also express my appreciation to the other faculty and to the administrative staff for their help and generosity. I would particularly like to thank Dr. Eric Zee

111 for his consistent support during all stages of my research in Hong Kong and for the many kindnesses that he showed me. During the early stages of my research I was provided with office and academic support in the Department of

Languages at the City Polytechnic of Hong Kong. I am grateful to the encouragement and advice given to me from then head of the department. Dr. Benjamin T'sou.

I also benefited greatly from many people outside the academic community. I express my gratitude to the members of our Tai Po church congregation for the warm hospitality and friendship they showed to my family and me. Along with their hospitality, they willingly agreed to participate in my project and allowed me to record their speech. Truly, without their willingness I could not have carried out my research.

Thanks also to Chan Fung Fong for her tireless and excellent work in helping me collect speech data. Her efforts not only increased my efficiency but added greatly to the richness and breadth of my data as well.

Lastly, I express my profound gratitude to my wife

Kathryn and to my daughters Nicole and Anne for their unfailing support throughout my entire stay here. They have sacrificed much and endured our journeys well. I could not have traveled them alone.

IV VITA

June 11, 1956 ...... Born - Minneapolis, Minnesota

1982 ...... B.A., The University of Minnesota, Minneapolis, Minnesota 1987 ...... M.A., The Department of East Asian Languages and Literatures, The Ohio State University, Columbus, Ohio

1988-1989 ...... Visiting Researcher, The Department of Anthropology, The Chinese University of Hong Kong 1983-1990 ...... Teaching Associate, East Asian Languages and Literatures, The Ohio State University, Columbus, Ohio

FIELDS OF STUDY

Major Field: East Asian Languages and Literatures

V TABLE OF CONTENTS

DEDICATION ...... ii

ACKNOWLEDGEMENTS ...... iii

VITA ...... V LIST OF TAB L E S ...... ix LIST OF FIGURES ...... xi

LIST OF ABBREVIATIONS ...... xii

CHAPTER ...... PAGE

I. INTRODUCTION ...... 1 1.0 INTRODUCTION ...... 1 1.1 S c o p e ...... 4 1.2 , Conventions and Terminology . . 6 1.3 Aims and G o a l s ...... 7 1.4 Description of the Language Situation .... 10 1.4.1 Chinese Dialects: Traditional Classification ...... 10 1.4.2 The Yue (Cantonese) Group ...... 15 1.4.3 Hong Kong Speech Community: Historical Setting and Current Situation .... 19

CHAPTER I NOTES ...... 27

II. A REVIEW OF SELECTED LITERATURE ON SOCIOLINGUISTIC VARIATION ...... 28

2.0 A General Review of Some Relevant Quantitative W o r k ...... 28 2.1 Key General Works on Variation on English . . 29 2.1.1 Selected Review of the Works of Labov ...... 29 2.1.2 Milroy's Social Networks ...... 34

VI 2.2 Work on Chinese ...... 37 2.2.1 Yeung's Study on Cantonese ...... 38 2.2.2 Pan's Study of Prestige Forms in Hong K o n g ...... 41 2.2.3 Bauer's Hong Kong S t u d y ...... 47 CHAPTER II...NOTES ...... 51

III. DISCUSSION OF THE^VARIABLES ...... 52 3.0 Discussion of Variables ...... 52 3.1 Sociolinguistic Variables ...... 52 3.1.1 Cantonese Initials ...... 52 3.1.2 Tonal Inventory of Standard Cantonese 54 3.1.3 Initials n - / l ...... 55 3.1.3.1 n-/l- Variation in Chinese Dialects 55 3.1.3.2 Initials n-/l- in the Delta ...... 57 3.1.3.3 Initials n-/l- in Hong Kong C a n t o n e s e ...... 60 3.1.4 Velar Nasal and Zero Initials .... 62 3.1.4.1 Initials ng-/0- in the ...... 62 3.1.4.2 Initials ng-/0- in Hong Kong C a n t o n e s e ...... 66 3.1.5 Initials k-/h- in Third Person P r o n o u n ...... 70 3.1.5.1 Pronouns only ...... 70 3.1.5.2 Comparisons to Other Chinese P r o n o u n s ...... 71 3.2 Independent Variables ...... 74 3.2.1 Social and Speaker Variables ...... 75 3.2.2 Contextual Variables ...... 83 3.2.3 Linguistic Variables ...... 85 3.2.4 Other F a c t o r s ...... 87

CHAPTER III NOTES ...... 89

IV. METHODOLOGY AND RESEARCH DESIGN ...... 90

4.0 Methodology and Research Design ...... , 90 4.1 Sampling Design ...... , 90 4.1.1 Random and Judgmental Sampling . . , 90 4.1.2 Sample S i z e ...... 91 4.1.3 Token Numbers ...... 95 4.1.4 Suggested Length of Speech Segments 96

vi i 4.2 Composition of the S a m p l e ...... 98 4.2.1 Main Sample: Church Groups ...... 98 4.2.2 Academic Associations ...... 100 4.2.3 Additional D a t a ...... 101 4.3 Investigator's R o l e ...... 102 4.3.1 Participant-Observer ...... 102 4.3.2 Investigators' R o l e ...... 105 4.4 Data Collection ...... 109 4.4.1 Methods ...... 109 4.4.2 Steps in Data Collection...... Ill 4.5 Method of Analysis ...... 114 CHAPTER IV N O T E S ...... 118

V. ANALYSIS OF RESULTS ...... 119 5.0 Introduction...... 119 5.1 Analysis of Statistical Results ...... 119 5.1.1 The Variable n g - / 0 - ...... 120 Variable ng-/0-: Historical *ng- . . . 121 Variable ng-/0-: Historical *0- . . . 128 5.1.2 The Variable n-/l- ...... 132 5.1.3 The Variable k-/h- ...... 139 5.1.4 Other Correlations: Dependent Variables ...... 143 5.2 Notes on Non-Statistical Observations .... 144 5.3 Anomalous Tokens ...... 146 5.4 Comparison to Previous Studies ...... 146

CHAPTER FIVE N O T E S ...... 150

VI. CONCLUSIONS...... 152 6.1 Main Findings and their Implications .... 152 6.1.1 Variation by S e x ...... 152 6.1.2 Age as a Factor; Sound Change or Age G r a d i n g ...... 154 6.1.3 Dialect Mixture ...... 157 6.1.4 The Lexicon and V a r i a t i o n ...... 163 6.1.5 Speech Register as a Variable .... 170 6.2 Other Notes on Outcome ...... 172 6.3 The Group and the In d i v i d u a l ...... 177 6.4 Future Directions ...... 178

CHAPTER VI N O T E S ...... 182

V I 1 1 APPENDICES ...... 184 A. Subject Background Data Sheets: Chinese and English Versions ...... 184

B. Interview Questions: Chinese and English Versions ...... 187 C. Individual Data and Results: (Main Data Base) . 192

D. Group Statistical Results: Main Data Base . .. . 204 Results by Social Background: Descriptive D a t a ...... 205 Analysis of Variance Tests for Main Factors . 218 E. Individual Data and Results: (Data Base 2: Speech R e g i s t e r ) ...... 221 F. Group Statistical Results: (Data Base 2: Speech R e g i s t e r ) ...... 227

G. List of Individual Tokens: Romanization, Characters and Parts of Speech . 231

H. Chinese Character List for Place Names .... 254

LIST OF REFERENCES ...... 255

IX LIST OP TABLES

TABLE ...... PAGE

1. Yuan's Population Figures by Major Dialect Groups . 11 2. Dialect Groups and the Provinces in which they are S p o k e n ...... 14 3. Five Sub-Groups of the Yue Dialects According to Place S p oken ...... 16

4. Hong Population by Place of Origin ...... 23 5. Place of Origin and Usual Language ...... 25 6. Percent Reporting Conservative Form Texts as Better 44

7. Sex/Age Group Scores For Conservâtive-Form Production Across Three Contextual Styles .... 45 8. Percentage Hypercorrection Between Sex and Age Group ( *1 > [n-]) 46

9. Yue () Initials ...... 53 10. Tonal Inventory of Standard Cantonese ...... 55 11. Realization of Historical *ng- in Pearl River D i a l e c t s ...... 64

12. Realization of Historical *0- in Pearl River D i a l e c t s ...... 65 13. Prescribed Realizations of Selected ng-/G- Lexis from Twelve Sources ...... 67 14. Pronoun Forms in the Pearl River Delta ...... 72

15. Summary of Independent and Dependent Variables . . 88

16. Distribution of Informants in Main Data Base by Sex and Age Group...... 93 17. Distribution of Congregation by Sex and Age Group. 99 18. Distribution of Hong Kong Population by Sex and Age Group...... 100 19. Mean [0-] Frequencies by Sex, Age, Education and Place of Origin (*ng-)...... 122

20. Mean 10-] Frequencies by Sub-groups for Sex, Age and Educational Level (*ng-) 124 21. Mean [0-] Frequency by Speech Register (*ng-) . . 126 22. Chisquare Analysis for ng-/0- by Word Class (*ng) 127

23. Mean 0- Frequencies by Sex, Age, Education and Place of Origin (*0-)...... 129 24. Mean [0-] Frequencies by Sub-groups for Sex, Age and Educational Level for [ * 0 - ] ...... 130 25. Mean [0-1 Frequencies by Speech Register (*0-) . . 132 26. Mean [1-] Frequencies by Sex, Age, Education and Place of Origin (*n-) ...... 133 27. Mean [1-1 Frequencies by Sub-groups for Sex, Age and Educational L e v e l ...... 135

28. Mean [1-1 Frequency by Speech Register ...... 136 29. Chisquare Analysis for n-/l- by Word Class . . . 138

30. Mean [h-1 Frequencies by Sex, Age, Education and Place of O r i g i n ...... 140 31. Mean [h-1 Frequencies by Sub-Groups for Age, Sex and Education L e v e l ...... 141 32. Mean [h-1 Frequencies by Speech Register ...... 143 33. Correlations between Dependent Variables ...... 144 34. Percent [0-1 Scores from Yeung's Study by Age and Sex (*ng-) ...... 147 35. Percent [0-1 Scores from Yeung's Study by Age and Sex (*ng-) ...... 147

36. Percent [0-1 Scores from Yeung's Study by Age and Sex ( * 0 - ) ...... 148

37. Mean Frequencies by Interviewer ...... 176

XI LIST OF FIGURES

FIGURE ...... PAGE

1. Provinces of C h i n a ...... 12

2. Distribution of Chinese Dialect Groups in . 13 3. Map of the Yue-Speaking Region of and Guangxi Provinces ...... 17 4. Map of the Territory of Hong K o n g ...... 21 5. Dialect Map of the Pearl River Delta: Distribution of n-/l- ...... 59

6. Dialect Map of the Pearl River Delta: Distribution of ng-/0- 63

7. (0-1 Frequencies by Word Class (*ng-) ...... 126

8. [1-] Frequencies by Word Class (* n - ) ...... 137 9. Variable Frequencies by S e x ...... 153

10. Mean Variable Frequencies by A g e ...... 156 5. Dialect Map of the Pearl River Delta: Distribution of n-/l ...... 162 8. [1-] Frequencies by Word Class (*n-) ...... 164

7. [0-1 Frequencies by Word Class ( * n g - ) ...... 168 11. Mean Variable Frequencies by Speech Register . . . 171 12. Mean Variable Frequencies by Education Level . . . 173

13. Mean Variable Frequencies by Interviewer ...... 176

Xll LIST OF ABBREVIATIONS

Consonants Vowels

Yale Yale Romanization IPA Romanization IPA

b P a a P P' aa a : m m e t n n eu oe 1 1 0 0 f f 1 i d t u u t t' yu y j tp m m ch tp ’ ng s P Tones g k k k ' h h High Level/High Falling 55/53 gw kw Mid (high) Rising 35 kw k ' w Mid Level 33 w w Low Falling 21 ng Low Rising 24 0 Low Level 22 y j *non-initial h designates low speech register in Yale romanization

XI 11 Parts of Speech adj adjective adv adverb av auxiliary verb cv coverb dem demonstrative loc locative n noun pi place pro pronoun rv resultative verb

sv stative verb tw time word

V verb VO verb-object

XIV CHAPTER I Introduction

1.0 INTRODUCTION

This report is a sociolinguistic study of variation in a Chinese dialect, namely Cantonese. Within the context of Chinese linguistics the above statement might seem a little curious, since traditionally the study of variation in Chinese has meant the study of a non-official dialect as compared to the official Mandarin dialect. This study differs from most previous work done on Chinese dialects in that it is not primarily concerned with areal variation but with variation correlated with social factors. Traditional dialect studies in Chinese as in the West generally have attempted to locate an ideal speaker to characterize a given variety of language. Typically, that speaker would be male, elderly, a rural dweller and with little or no contact with speakers of other dialects or languages.

In the last twenty or thirty years, however, a substantial number of studies have come forth that focus on non-areal factors in variation such as sex, social class and speech register (situational context). With some notable exceptions, this literature is limited to Western languages 2 and only a small number o£ studies have been carried out on Chinese. Those reports that do exist mostly concern Cantonese. This thesis analyzes natural speech data collected in Hong Kong between December 1988 and March 1989. The central aim of the study is to investigate the effect of independent non-linguistic variables on a set of three sociolinguistic variables: n-/l-, ng-/0-^ and k-/h-. For the independent variables, focus is placed on the variables of sex, age, and speech register. In addition, a number of other factors are analyzed including: interviewer, social group, ethnicity

(place of family origin) and education as well as the linguistic variable of lexical category. This study is divided into six chapters. The first chapter defines the scope of the study and provides the background information for the site and language studied. The second chapter is a critical evaluation of the literature relevant to the present study. The first part of chapter two discusses the work of William Labov and Leslie Milroy, whose work provides the basis for much of the sociolinguistic research being conducted today. The second part of chapter two notes sociolinguistic studies on Chinese, and examines in some detail the results of three theses on Cantonese sociolinguistic variation carried out in

Hong Kong (Yeung 1980, Pan 1981, Bauer 1982). Chapter three is a detailed discussion of the variables used in this study, both dependent and independent. The first part of the chapter outlines what has been reported in 3 dialect surveys concerning the three dependent, or sociolinguistic, variables for both Hong Kong and its surrounding areas. The second part discusses independent variables and some theoretical issues involving these variables.

In Chapter four I discuss methodology and design of the study, including the sample, data collection and statistical methods. The role of the investigator in the study is also discussed in chapter four. The fifth chapter is an analysis of the results. Three data bases form the basis of this study. The first data base pertains primarily to social background factors and the second focuses on speech register. A third data base is used to analyze the effect of word class on the three dependent variables. The data from these three data bases are used to analyze the effects of speaker background on each of the three sociolinguistic variables: ng-/0~, n-/l-, and -/h-. The first part of chapter five analyzes statistical results of the study. Chapter five also contains selected non-quantitative observations taken from my fieldwork notes. Lastly, the chapter five includes a brief comparison of my results with previous data on Cantonese variation, particularly Pan (1981) and Yeung (1980).

The sixth and concluding chapter discusses the main findings and implications of the results of the study, both theoretical and practical. Directions for future study are also discussed. 4

The first two appendices (A and B) provide the two main instruments I used to gather data on informants, subject background data sheet and a list of interview questions. English translations are also provided. Appendices C to F provide detailed results for the study, including individual data and group statistical results with statistical output. Lastly, Appendix G lists the for all tokens from my transcripts that involve the three sociolinguistic variables and Appendix H provides the characters for many of the place names used herein.

1.1 Scope

Milroy (1980:10) defines a sociolinguistic variable as "a linguistic element which co-varies not only with other linguistic elements, but also with a number of extra- linguistic independent variables such as social class, age, sex, ethnic group or contextual style." In Cantonese Chinese, as in all languages, there are many such sociolinguistic variables. I investigate here a set of these alternations as to how they are stratified socially in current Hong Kong speech. The set includes the variants' initials n-/l- and ng-/0- and k-/h- (third person pronoun only). Following are a few examples of words in current Hong Kong Cantonese speech which exhibit the kind of variation as noted above; naahm laahm 'male' neiuh leuih 'female' ho-nahng ho-lahng 'possible' cheat-nihn cheut-lihn 'next year' ngoih-gwok-yahn oi-gwok-yahn 'foreigner' ngaam-ngaam aam-aam 'right now' keuih heuih 'he/she' keuihdeih heuihdeih 'they'

For many Hong Kong speakers these variables are thought to be in free variation. By free variation I mean here that for speakers who have both forms in their speech the choice of one variant over another does not distinguish meaning. The variables above are chosen on the basis of their feasibility of study as well as their potential to reveal interesting facts about the nature of social variation in Cantonese and language in general. Moreover, each seems to present an essentially dichotomous choice.

The linguistic variables are observed as to their social stratification. The speaker's choice of variants is analyzed to see how they are affected by the independent social variables noted above. Though not a main focus of this study, several independent linguistic factors (e.g., word class and phonological environment) are also examined briefly. In addition to the above variables this study puts a special emphasis on speech register as a variable. In one of the data bases the speech of the informants was recorded at three levels: public presentation, interview, and impromptu conversation. The frequency of usage of each variant is then analyzed with respect to the three 6 levels/styles. Statistical significance tests are applied to the data collected to determine if the differences in frequency of a variant for each sub-group are indeed significant or simply the result of chance. Methods of collection and statistical analysis are discussed in detail in chapter 4.

1.2 Romanization, Conventions and Terminology References and place names are generally given in romanization. Exceptions to the general rule are when the Chinese place names have well established alternatives such as Hong Kong for the pinyin Xianggang.

When more than one version is common, both are provided. For Chinese authors from outside the People's Republic of

China (PRC) who use an alternative romanization system, I use whatever system they themselves use. Cantonese citations are mostly in the Yale system, which is the most commonly used system for Cantonese. As with Mandarin, when there is a widely accepted alternative spelling for a place name or for an author's name, I use the alternative spelling.

Occasionally, words or phonemes are represented using the International Phonetic Alphabet (IPA). For the convenience of readers not familiar with the Yale system, a

Yale/IPA comparison chart is provided in the preliminary

pages under abbreviations. When referring to varieties of the , I use the term dialect for the Chinese term fangyan, literally 7

'regional speech.' Although there has been some controversy as to whether to refer to mutually

unintelligible speech varieties as dialects or as languages (see Bloomfield 1933; Hudson 1980: Chapter 2; Norman 1988:

chapter 1; Chan and Tai 1989), the term remains useful for several reasons. Unlike related languages such as English

and German, the Chinese dialect groups have a long and continuous history of cultural and political unity.

Moreover, those who speak Cantonese or Shanghaiese think of themselves as speakers of dialects of Chinese and not

speakers of different, related languages. This perception is largely a result of strong political unity and a shared written system. Consequently, there is a cultural reality to the terms in spite of objections on linguistic grounds.

Whatever the merits or limitations of the term dialect

as a technical term, it remains a useful term to distinguish historically recognized . In any case, - the issue is not central to this study. In this report I adopt the convention of most Chinese linguists and Chinese speakers and refer to the major and minor areal groupings as dialects (as described below in section 1.4).

1.3 Aims and Goals There are many aspects of language which can be advanced by a quantitative study. As Trudgill (in Milroy

1980:ix) and others point out, studies of this kind raise questions such as: what styles, registers, etc. are employed in various social situations? What are the conditions under 8 which these forms and variations are used? What are the implications in selecting a given choice? In addition, the study of variation in language can aid us in the understanding of the nature and the mechanisms of language change.

The study has several aims. The first and most basic aim is to produce a clear synchronic picture of the language situation in Hong Kong with regard to the three sociolinguistic variables in question and to demonstrate that variation in Cantonese is indeed meaningful and central to language use. Furthermore, I hope to throw some light on the role of social factors in language variation, an issue which has so often been relegated to the trivial or the peripheral in the study of Chinese. Indeed, my thesis here

is that much of the variation found in language is patterned and that mastery of variation is an integral part of knowing a language. Another key issue explored in this study is the effect of speech register on sociolinguistic variables. The effect of speech register called situational context by Labov--- is not unique to this study and has in fact become routine in sociolinguistic work. However, this study differs conceptually from most other studies in that the registers chosen for analysis can be seen as natural speech types. It has been demonstrated by numerous studies that speech type

is a factor in variation, but often the speech types themselves are not natural to a speaker's experience (e.g., word lists and minimal pairs). Consequently, not very much 9

is known about how natural contexts affect sociolinguistic variables. This study devotes a separate data base to the investigation of speech register as reflected by three speech types: public speaking, interview and impromptu. Another issue to be raised is whether or not the variation in question is a sound change in progress. Labov

(1972a:122) discusses some of the ways in which quantitative sociolinguistic study can aid us in understanding whether or not an instance of variation is a change in progress. Labov particularly identifies crossover in variation as it relates to social class as a sign of change in progress; for

example, one group might hyper-correct in a certain context toward a perceived prestige form. A secondary goal of this report is to provide an update

on perceived 'changes in progress.' Some observers and scholars (Hashmimoto 1972:89) of the Cantonese language

situation have speculated or implied that certain variation in Cantonese demonstrates a change in progress, notably the n-/l- variable and to a lesser extent with the ng-/0- variable. The only two quantitative studies concerning ng- /O- and n-/l- , (Pan 1981 and Yeung 1980) report the

existence of variation and suggest that this variation represents a phonological change in progress.

It has been nearly ten years since the two Hong Kong studies were completed and it will be interesting to compare their results with mine. Though the design of my study differs from Yeung's and Pan's, I hope that a comparison of

our results will be useful in determining whether the 10 observed variation is indeed a change in progress or an age grading situation* .

1.4 Description of the Language Situation

This study is intended for two distinct audiences: those interested in Chinese dialect studies and those interested more generally in sociolinguistics. This section

is intended primarily as background for the latter group. Below I give a brief sketch of the Chinese dialects as they have been traditionally classified, focusing particularly on

the Yue (Cantonese) group. In addition, I give a brief description of Hong Kong and its historical and current

language situation. For a fuller historical and synchronic description of the Yue group, the reader is referred to

Hashimoto (1972) or Norman (1988). The main Chinese- language sources are Yuan (1983) and Cheung (1972).

1.4.1 Chinese Dialects: Traditional Classification Chinese is usually divided into seven major dialect

groups (following Li Fang-kuei 1939), each group of which is

mutually unintelligible to one another. There have been no recent surveys or estimates of the numbers of speakers of each group. Relative percentages are generally given based

on a dialect survey carried out between the years of 1955- 1958 by Yuan (1983). Based on a 1956 population statistic

for China of 547,000, Yuan estimates the number of dialect speakers in each group. The population of China has nearly doubled since Yuan's survey. The overall 1983 population 11 figure for China was 1.02 billion (Guojia Tongjiju [National Statistical Bureau] 1984:2). Nevertheless, the relative proportions for each dialect group probably are not too far off. Yuan's figures are outlined in Table 1.1 below (after Chan 1987:77).

Table 1 Yuan's Population Figures by Major Dialect Groups

Dialect Group Number of Speakers Percentage

1. Mandarin 387 million 70.0% a) Northern (186 million) b) Northwestern (53 " ) c) Southeastern (26 " ) d) Southwestern (120 " )

2 . Wu 46 million 8.4% 3. Xiang (=) 26 million 5.0% 4. Gan (=Jiangxi) 13 million 2.4% 5. Kejia (=Hakka) 20 million 4.0% 6 . Yue 7 million 5.0% (=Cantonese) 7. Min 15 million 4.2% a) Northern (7 million) b) Southern (15 million) (=Fukienese)

Within each major dialect group there are also sub­ groups. For example. Yuan lists four sub-groups of Mandarin and two sub-groups of Min. The Yue group as is discussed below can be further sub-divided as well. Figure 1 shows the provinces of China and Figure 2 shows the approximate boundaries of the seven dialect groups. Table 2 outlines their distribution with respect to provinces^ . 400 •00 1:00 1400 Km. I :oo 400 •00 •00 1000 ml. ( NKILONOJIAHO M t

4% %»», « r ti^iMO «imjiamo ^ \ "*•** * V f

...... ••“r'v ; f /W%r

..... Mhanthil \ ,, I \ *1 Hum \ y , V "'V Î IIICHUAM .. t .

V y ÿ - S ».« r ? ! % '

.•. rtovlxeiit bou(i4êri(i Htinin IilinO

Figure 1 Provinces o£ China M ro «00 800 1200 1800 Km, 200 400 «00 «00 1000 ml

H t

NORTHERN

NORTHWESTERN

... SOUTHWESTERN , r ' - Æ /XIAHG \

— — *N 1/^ j J / *

Dl«Uet group beundorltt SubdIaUet group boundarlta

Figure 2 Distribution of Chinese Dialect Groups in China w 14

Table 2 Dialect Groups and The Provinces in Which they are Spoken

1. Mandarin Dialect Group a) Northern (including and Tianjin) , , , Heilongjiang (=Manchuria) Northwestern part of Northeastern part of Eastern part of Inner Mongolia b) Northwestern Shanxi (including Taiyuan) (including Xi'an) (including Lanzhou) Xinjiang, Ningxia, Qinghai Western part of Inner Mongolia c) Southeastern Central Jiangsu (including Yangzhou) Central Anhui (including Hefei) Southeastern Northern Jiangxi d) Southwestern (including and ) (Suizhou, Hubei (except southeast corner) Northwestern part of Hunan Northwestern part of Guangxi (including Guilin)

2. Wu Dialect Group Southern Jiangsu (including , Suzhou, Wuxi) Southeastern Anhui 3. Xiang Dialect Group Hunan (including ) (=Hunanese) 4. Gan Dialect Group Jiangxi (including Nanchang) (=Jiangxi) Southern Anhui

Southeastern Hubei 5. Kejia Dialect Group Communities scattered in (=Hakka) Sichuan, Jiangxi, Hunan, Guangdong,(including Meixian), Guangxi, and . 6 . Yue Dialect Group Gucingdong (including Guangzhou (=Canton), (=Cantonese) Taishan, Zhongshan, , ifeicao, Hong Kong) Southeastern Guangxi 7. Min Dialect Group a) Northern Min Southeastern Zhejiang Northeastern part of Fujian (including Fuzhou) b) Southern part of Fujian (including Xiamen (=Pukinese) (=Amoy) ) Northeastern (Suangdong (including Chaozhou, Dongshan and Hainan islands), Taiwan 15

1.4.2 The Yue (Cantonese) Group The Yue dialect group, commonly called Cantonese, is spoken throughout Guangdong and in southeastern Guangxi. Strictly speaking, the term Cantonese is used only when referring to the variety of Yue spoken in the city of

Guangzhou (Canton) and in Hong Kong. Although the Yue group is fairly homogeneous compared to some other dialects groups of China, there are several major subdivisions of within the group. Yuan (1983:177) classifies the Yue dialects into five basic groups. These five groups do not comprise all areas where the Yue dialect is spoken in China, but they do cover most of the territory where the Yue dialect is common. Moreover, Yuan does not specifically indicate where many of the various villages particularly in the Coastal Yue group belong in relation to the five sub-groups. For the most part one can determine boundaries by the description Yuan provides, but with certain locales the situation is not clear. For example. Coastal Yue is said to include ceratin areas along the West River. Implied, then, is that certain areas along the West River do not belong to Coastal Yue. Table 3 below summarizes Yuan’s groupings (1983:179 after Bauer 1982:25). Figure 3 shows the approximate boundaries of Yuan's subgroupings. 16 % b l e 3 Five sub-Gcoups of the Yue Dialects Axording to Place Spoken

Sid>-C^oi9 Places %ioken

1. Coastal Yue Much of the Pearl Delta. Includes Guangzhou, Bao'an, as well as the ('Ihree Counties') area of Panyu, Nanhai and Shunde. Also includes certain areas along the West River. 2. Yin-Lian Spoken in the area between Yinzhou and Lianzhou in the coastal area of Guangxi Province.

3. Gao- Spoken in the area between Gaozhou and Leizhou in the southwestern part of Guangdong Province.

4. Spoken in an area covering Taishan, Xinhui, (Seiyap) Kaiping and in an area southwest of Guangzhou and near the coast of the South China Sea.

5. Southern Spoken in area including Wüzhou, Rongxian, Gui Yulin, Gui, Bobai and other areas in the southeastern part of Guangxi province. Hunan Province Jiangxi Province

Fujian Province

Guangxi Zhuang Autonomous Region Guangdong Province

Chaozhou COASTAL YUE SOUTHERN GUI Wuzhou Shantou I

Guangzhou Haileng Lufeng • Dongguan Nanhai • Rongxian Panyu Shunde Bao’an Yulin • YIN-LIAN SIYI Kaiping Zhongshai Bobai # Xinhua Enping Taishan Xianggang « Yinzhou Gaozhou Macau V (Hong Kong) C,. Lianzhou GAO-LEI

Leizhou South China Sea

Hainan Island

Figure 3 Map of the Yue-Speaking Region of Guangdong and Guangxi Provinces 18

Though not mentioned specifically in Yuan's classification of Yue dialects, Hong Kong Cantonese with its association with Guangzhou belongs to the Coastal Yue Group.

Though there are differences^ between the variety of Chinese spoken in Hong Kong and the variety spoken in Guangzhou, both places are widely recognized as representing Standard Cantonese. Most Hong Kong residents who do not speak standard Cantonese, speak another Coastal Yue dialect. Yuan also fails to mention other important Yue dialect sites, such as Zhongshan (=Shiqi) and Macau. Although

Zhongshan varies noticeably form the Guangzhou/Hong Kong standard, it is probably closer to Yuan's Coastal Yue group

than to the Siyi group to its west. The area around Zhongshan is geographically isolated from the other two centers of Standard Cantonese; it is further south of

Guangzhou than the Coastal Yue dialects mentioned by Yuan and is separated from Hong Kong by the Pearl River delta. It is clearly not a Siyi dialect and should probably be placed in the Coastal Yue group in Yuan's classification. However, for my own purposes I also consider it separately in my analysis (see section 3.2). Although Macau is part of the same region as Zhongshan,

it has come under heavy influence from Hong Kong since about

the 1960's through increasingly strong commercial contact. In addition, Macau receives most of Hong Kong's radio and television broadcasts. 19

Yuan's classification does not cover all the areas where there are speakers of Yue dialects, but where Yue dialects dominate. Moreover, within the Yue dominate areas

other non-Yue dialects can be found. Kejia speaking

villages are widely spread throughout southern Guangdong province and in eastern Guangdong Min speakers are found alongside of Yue speakers. Boundaries are not always clear and there is movement between areas. Consequently, the situation for the Yue-speaking areas is more complex than a simple taxonomy might indicate. Nevertheless, Yuan's grouping provide us with the general picture concerning the

Yue dialect areas.

1.4.3 Hong Kong Speech Community: Historical Setting and Current Situation

The territory of Hong Kong is located on the southern coast of Guangdong province. The territory consists of three main parts: Hong Kong Island, Peninsula and

the . The territory also encompasses more

than 200 islands (see figure 4). Both Hong Kong Island and have been British colonies for more than 100 years. Britain acquired Hong Kong Island in 1841 as compensation for China's loss of the First Opium War (1840-42). Nineteen years later (1860), Britain also acquired Kowloon Peninsula as far north as Boundary Street along with in perpetuity

from China as part of the . In 1898, THE TERRITORY CHINA OFHONÔKONÛ'

Dabu (Tai Po)

NEW TERRITORIES Tuntnen (tucn hun) Quanwan Shatian (Tsuen Wan) (Shatin)

KOWLOOl

HONG KONG ISLAND

o

Figure 4 Hap ot the TerritoryTerri tory of Hong Kong (Adapted from American Chamber of Commerce in Hong Kong 1986} (O o 21 the New Territories with 235 islands were leased to Britain for a period of 99 years. As part of the Joint Declaration of December 1984, the British and the Chinese governments agreed to transfer the whole of Hong Kong territory back to the People's Republic of China on July 1, 1997. The territory will then have the

status of Special Economic Zone and is to be governed by the

provisions of the Hong Kong Basic Law which was signed in March of 1990. Under the agreement, Hong Kong is to be allowed to preserve its free-market system and to exercise a large measure of political autonomy for at least 50 years

after 1997. However, in the wake of the Tiananmen incident of June 6 , 1990, in which thousands were killed or injured when the Beijing government put down the pro-democracy movement, there is widespread disbelief in the efficacy of

the Basic Law document. As a result, the pace of emigration

out of Hong Kong is expected to pick up considerably. Hong Kong covers a land mass of about 1,060 square kilometers and according to the 1981 Census has a population

of 4,986,560. Most of the population (72.8%) lives either on Hong Kong Island (23.7%) or on Kowloon (49.1%). Over the last ten years or so the government has established 'New Towns’ around several New Territory villages. The four

largest resettlement locations are around Tsuen Wan, Yuen Long/Tuen Mun, Taipo/Panling and Shatin. These resettlement

locations, according to the 1981 Census, together make up 24.4% of the population of Hong Kong. The other 2.8% of the population live on the outlying islands, in the remote parts 22 of the New Territories or live permanently on boats in the various harbors of Hong Kong. In spite of Hong Kong's popular image as an English- speaking British colony, its population is overwhelmingly Chinese. The 1981 Census reports 98% of Hong Kong's population claims Chinese origins. The next largest group is the British (.5%) followed by peoples from the countries just south of Hong Kong, including the Philippines, Malaysia and Indonesia. Details are presented in Table 4 below. 23

Table 4 ffonç Kong Population by Place of Origin (adapted from Table 3, 1981 Hong Kong Census^ Basic Tables)

Place of Origin Total laibei Percentage

Bong Kong 124,279 2.5 China Guangzhou, Hacan, and adjacent areas 2,455,749 49.2 Siyi (Seiyap) 814,309 16.3 Chaozhou 566,044 11.4 Elsewhere in Guangdong 470,288 9.4 Elsewhere in China 454,985 9.1 Sub-total 4,761,375 95.5 Singapore, Malaysia, The Philippines, Indonesia and Brunei 19,630 .4 Cambodia, Vietnam, Laos, Burma and Thailand 9,007 .2 Japan 6,740 .1 India, Pakistan, Bangladesh and Sri-Lanka 11,867 .2 United Kingdom 25,703 .5 U.S.A. 5,483 .1 Canada 761 - Australia and Mew Zealand 4,232 .1 Other European Counties 5,980 .1 Other Countries 11,503 .2 Total 4,986,560 99.8»

*less than 100% because of rounding 24

As a percentage of the population, the non-Chinese community is quite small. However, the two percent non- Chinese minority exerts a large influence over the government and business sectors of society. Most of the non-Chinese foreigners from Europe, and Japan are involved in international business and are as such disproportionately influential in the affairs of the territory. The influence of the non-Chinese on the language of the Cantonese majority is much more open to question (see Chan and Kwok 1982, 1985). Along with the ethnic diversity, there is a great

diversity of languages. However in spite of the large number of languages spoken in Hong Kong, the predominate language is Chinese. The overwhelming majority of those

speakers use some form of Cantonese. Although the 1981 census does not include data about

language us, the 1971 census did include information about the usual home language spoken by residents of Hong Kong. Note the figures in table 5 below. I’able 5 Place of Origin and Ihitnl I.nnifuane fadapted from Table 3,1971 lIonR Kong Population and Housing Census, p.71)

Language Used in Home (by Number of Speakers)

Place of Cantonese Hakka Hoklo Seiyap Other English Other Mute Total Origin laM. Language of China HK.NT 158,790 23,790 1,506 153 586 392 40 442 185,699 Canton 1,983,372 62,664 11,539 2,571 7,951 1,200 293 2,493 2,072,083 Macau Seiyap 632,174 1,813 3,120 42,346 4,242 244 67 768 684,774 Chaozhou 232,215 5,169 107,979 1,006 13,800 143 74 600 391,454 Guangdong, 232,215 9,260 4,810 461 2,916 192 74 287 250,215 nec China nec 187,184 1,242 35,006 471 59,433 655 406 396 284,793 South Asia 1,259 9 30 1 127 2,438 4,854 9 8,727 Malaysia, 2,006 119 100 3 95 504 269 3 3,099 Singapore Asian 1,595 25 137 8 199 956 7,587 4 10,511 Countries 2,804 Pacific, 318 5 6 - 15 2,090 367 3 Oceania Britian nec 3,557 122 36 2 75 24,964 225 23 29,004 Europe 927 5 2 - 29 2,773 1,536 6 5,278 USA, 335 9 - 27 • 32 4,191 84 2 4,680 Canada 578 Cent. & 233 8 1 2 218 116 South America 202 Africa 47 3 -- - 109 43 - Unkown 2,540 41 24 3 13 50 15 43 2,729 3,936,630 Total 3,469,235 104,284 164,295 47,053 89,515 41,119 16,050 5,079 .4 .1 99.9 Percentages 88.1 2.6 4.2 1.2 2.3 1 to in 26

With the exception of the Siyi (Seiyap) group, the census does not differentiate the other Yue groups. In T'sou's (1976) analysis of the 1971 census data he uses the term Basic Cantonese to specify all Yue varieties of the Pearl River Delta except the Siyi group. Presumably, the census takers did the same. Alternatively, the people themselves may have not considered the non-Siyi Yue dialects sufficiently different to refer to the dialects as anything but Cantonese. My own fieldwork experience would suggest the possibility. Whereas Siyi speakers often told me that they speak a Siyi dialect, rarely did informants say they spoke, say Panyu at home. It appears, then, that all non- Slyi Yue dialects of the Pearl River Delta are considered as

"Cantonese" in the 1971 census, Basic Cantonese by T'sou and Coastal Yue by Zhan and Cheung. Hence, in the census data, the term Cantonese is not synonymous with the term Standard Cantonese as defined early. More than ninety-eight percent (98.1%) of the population of Hong Kong speaks some variety of Chinese

according to census figures. 8 8 .1 % of the population uses basic Cantonese (T'sou 1976) and just 1% speak a Siyi

(Seiyap) variety of the Yue group. Speakers of all non-Yue groups combined make up about 9.1% of Hong Kong's

population. 27

NOTES TO CHAPTER ONE

1 . 0 - here represents the "zero" initial, phonetically smooth or.set or the .

2. Age grading is the phenomenon whereby children use a certain linguistic form with their peers and subsequently abandon that form. The forms are form is transmitted to the next generation exclusively by children without the involvement of adult speakers. See Hockett 1950 and Hudson 1980; section 1.3.2 for further discussion of the age grading.

3. Maps 1 and 2 along with table 1.2 are also from Chan (1987: 6 -8 )

4. Lee (1983) for example argues that the phonetic quality of Hong Kong Cantonese differs from Guangzhou Cantonese vowels. CHAPTER II

A Review of Selected Literature on Sociolinguistic Variation

2.0 A General Review of Some Relevant Quantitative Work

In spite of the recent proliferation of studies on sociolinguistic variation generally, the vast majority of work on Chinese is along the lines of traditional dialect study. There are exceptions (Barale 1982; Bauer 1982, 1983;

Yeung 1980; Pan 1981; Sanders 1986) to this lack of interest to be sure, and these are discussed below. For the most part, however, the sociolinguistics of Chinese remains underdeveloped. In this section I discuss some work that has been done concerning sociolinguistic variation in English and Chinese. The discussion of the literature on English concerns the work of William Labov and Leslie Miloy. The work of these two scholars bears directly on the model used in the present study and will be examined below. The literature on Chinese, being much smaller, can be discussed more completely.

28 29

2.1 Key General Works on Variation on English Among the quantitative work on the sociolinguistics of English carried out in the last two decades, the work of William Labov stands out. Labov is widely acknowledged as the founder of the modern quantitative approach to sociolinguistic variation, though many others have built upon and improved his early model. Of particular interest here is Leslie Milroy's work on Belfast English and her use of the participant-observer role. The present study draws from both models and the basic features of each are outlined and discussed below.

2.1.1 Selected Review of the works of Labov William Labov's pioneering works of the 1960's such as his dissertation on Martha's Vineyard (see Labov 1972b) set the stage for much of the subsequent work in social linguistics. Rooted in the tradition of regional dialectology, his work clearly is aimed at the study of language change. However, he was the first to really examine social background as a factor in language variation. His framework has served as a model for many of studies that followed. The most outstanding problem in his work is the lack of any test of significance for his descriptive statistics. Many who have modeled their studies after Labov's early work unfortunately have copied

this defect as well. Labov's study on r- stratification in New York department stores (1972b, chapter 2) first suggested the 30

effect of social factors on a linguistic variable. Taking three department stores representing three socio-economic levels he tests his hypothesis that the use of -r in words

such as fourth, car, four, etc. is tied to careful speech situations and to speech of high prestige. After noting what sorts of items were found on the fourth floor, he approached clerks in the stores and asked "Excuse me where can I find the women's shoes?" or "Excuse me, what floor is

this?" After eliciting the inevitable reply; "Fourth or fourth floor" Labov asked the person to repeat what they had said thereby drawing out a more emphatic reply and with it a more careful pronunciation. He then estimated the age

of the person and noted their sex. In this way Labov was able to observe how the variables of sex, age, social

status, and speech style affect the stratification of the -r variable.

In spite of the ingenious methodology of Labov's study

there were clearly significant methodological and theoretical problems associated with it. Most important is the notion of 'observer paradox' or 'observer effect'. There is no way to tell what kind of effect Labov's appearance or manner of speech had on the response. This

problem underscores the need to minimize or at least try to account for the effect of the observer. As a result, the notion of observer's paradox has become an important part of

discussions on method of data collection by Labov and others in subsequent studies. 31

Labov's Sociolinguistic Patterns (1972b) summarizes many important issues of his work. In chapter eight, 'The

Study of Language in Social Context', Labov states five methodological axioms (1972b: 208-9).

1) Style Shifting. The speaker shifts his style of speaking as the social situation and topic of discussion change. Correlated with the change in style is a change in the linguistic variables the speaker uses. 2) Attention. Styles of speaking form a continuum from the least amount of attention the speaker pays to his speech characteristic of casual or emotional speech to the greatest amount of attention the speaker gives to his speech when reading a list of words or talking about the pronunciation of a word. 3) The Vernacular. The speaker pays the least attention to his speech when speaking the vernacular, and consequently, the vernacular provides the most structured data for linguistic analysis. 4) Formality. Talking to speakers about the way they speak creates a formal situation which causes them to self-consciously attend to their speech. The Sociolinguist must observe the speaker's vernacular style without seeming to do so. 5) Good Data. No matter what other methods are used the only way to obtain sufficient good data is through tape-recorded interviews.

The above axioms have become key issues in the design and implementation of most studies of social variation. Some of these axioms have become nearly universally accepted, others have been challenged. There is almost no argument

with the first. Even many untrained observers will notice that people 'talk differently' when speaking in different situations. Just what that difference is and what that 32 difference implies has been the question preoccupying those interested in sociolinguistic variation.

The second axiom is much more open to question and can

be challenged on several grounds. I do not agree with Labov's notion in (2) that word lists are an appropriate

representation of the formal style. Instead I use public

speech-making as the most formal level. Giving public speeches is something that many people actually do, in particular those who are the subjects of the present study. On the other hand, word-lists do not represent a speech

event at all (except in the contrived situation between linguist and informant) and thus should be rejected as a

source of natural data^ . Labov's insistence in the third axiom on the vernacular as the primary object of study needs to be questioned.

Labov defines the vernacular as 'the style in which the minimum attention is given to monitoring of speech' (1972b; 208). Though Labov's bringing the vernacular back into the mainstream of study was a crucial step in the development of

modern sociolinguistics, I believe he goes too far in his second and third axioms. As important as the vernacular is to language study, it is still only one of many varieties

available to the speaker of a language. Labov implies here that there is a fundamental superiority to the vernacular as an object of study. Yet there is no reason to suppose that the vernacular provides more structured or inherently more interesting data to the sociolinguistic investigator. If we are interested in learning more about how people use 33 language, then the main question should be whether the speech type occurs naturally and not how much attention a speaker pays to his or her speech.

The fourth axiom raises the issue of the investigator's effect on the formality of speech. That the investigator must observe the speaker's language without appearing to do so is a key concept of variation studies as well. This concept is especially important in view of

Labov's strong insistence on the vernacular as the primary object of study. Labov suggests five sub-contexts (Aj.5 ) within the generally more formal interview/visit situation that tend to yield casual speech data: speech outside the formal interview (e.g., pre-interview remarks); speech with a third person; speech not in direct response to questions; childhood rhymes and customs; and danger of death questions. Common to these situations is that the subject must be intently enough involved in the conversation so as not to be too concerned with his speech. But again, the crucial issue is not whether subjects pay attention to their speech or not, but whether they artificially do so for the benefit of the observer. There is no problem in observing and analyzing attentive speech; the problem arises only when we observe it to the exclusion of other speech, including the vernacular. Labov's fifth and last axiom on good data needs revision. This axiom was put forward almost two decades ago and was at least partially motivated by problems in obtaining good quality recordings in non-interview 34 situations. Recent technological advancements in recording equipment now allow the investigator to get adequate quality data in many other situations as well. Investigators adopting a participant-observer role (cf. Milroy 1980), for example, can now often get data of very good quality without primary reliance on the interview. We thus can and should expand our data base to give status to many kinds of data.

Labov's work has been well discussed in a large portion of the literature on sociolinguistics and is widely known by many in related fields as well. Moreover, its scope prevents a full discussion here. It has and continues to serve a central role in the study of social variation in language.

2.1.2 Milroy's Social Networks The next study to be discussed here is Leslie Milroy's (1980) Language and Social Networks. Building on the tradition of Labov, this work analyzes eight linguistic variables according to sex (gender), age, place (i.e., neighborhood) and speech level for working-class speakers in Belfast Ireland. The central innovation in Milroy's study is the application of the concept of 'social network' as an independent, non-linguistic variable. Her conception of social network is built on the empirical work of European anthropologists (especially Mitchell 1969; Boissevain and

Mitchell 1973; Boissevain 1974). Using the network concept, Milroy develops a network scoring system based on the relative 'density' and 'multiplicity' of the speaker's 35 contacts or social group. In her study density is a measure of the number of people in a group and multiplicity is a measure of the number of ways that people are linked to one another (e.g., as kin, neighbor, co-worker, voluntary associate (friend)). While the notion of social network as a non-linguistic variable proves to be highly significant in almost all the variables examined by Milroy, it is not certain that it would be so highly significant among all groups. Belfast was selected in large part based on its social situation.

The working-class neighborhoods in the context of long-standing conditions of virtual civil war, make the notion of network particularly salient there. There is little doubt that networks exist in all speech communities and probably have some effect on the use of certain linguistic forms. In the case of Hong Kong, however, it is unlikely that the concept of network is as important as it turned out to be in Belfast. In the last forty or so years urban Hong Kong society has become extremely mobile and the neighborhoods of the type described by Milroy are now rare. Moreover, there is no great political polarization of the kind present in Belfast. In addition to social network as a variable, Milroy uses three levels of style as variables: spontaneous, interview, and word list. Both the interview style and the spontaneous style are used in a modified form in the present study. 36

Another important part of Milroy's study is her extensive discussion of methodological issues. Among the most important of the issues discussed there are Labov's notion of observer's paradox (Milroy 1980:45), collection methods, and insider/ outsider influences (c.f. Labov 1972b).

Milroy (1980) points out how other factors in the fieldwork situation can overcome the effect the observer has on spontaneous speech— most notably in her case, the length of observation time. Though speakers do monitor and modify their speech when knowingly observed, their capacity to do so for long stretches is limited. Therefore, when one has the opportunity to record long stretches of conversation, the observer's effect is minimized. Labov has shown in his 'Language in the Inner City' (1972a) study that peer

pressure also can reduce the informants' self-monitoring: pressure from peers (fellow gang members) on the informants

to conform to their usage was stronger than the tendency to shift style to benefit the outsider.

Lastly, a crucial aspect of Milroy's study is her use of statistical analysis to handle the data she collected through her case studies. Analysis of Variance (ANOVA) is employed to determine if the observed variance is

statistically significant or not. A significance test is critical because what appears to be a large difference sometimes is not significant at all because of a small number of cases or large number of variables. In addition,

the ANOVA can show interactional effect between variables. 37 whereby the effect of one variable might cancel or enhance another variable's significance. As important as a significance test is to quantitative social linguistic study, many have ignored it, choosing instead to rely purely on descriptive statistics (charts, tables, etc.). Significance tests are overlooked in Labov's early work and since by many who have emulated his work.

2.2 Work on Chinese The literature on quantitative research of variation in Chinese is small indeed. Most of what work does exist pertains to the Cantonese dialect including two University of Hong Kong M.A. theses (Yeung 1980, Pan 1981). Following these two works are a dissertation and two follow-up papers by Robert Bauer (1982,1983,1986). In addition there is one paper based on the second of the theses above by Peter Pan

(n.d.). All of the above works concern phonological variation in Hong Kong Cantonese, though only the two M.A. theses deal with any of the variables under discussion here.

In addition to several works on Cantonese, there are two dissertations that deal with variation in Mandarin.

The first (Barale 1982) is a quantitative analysis of final loss in the . The second (Sanders

1986) deals with syntactic variation in Pekingese. Also, in the last few years several papers have appeared in the PRC literature dealing with sociolinguistic variation in Mandarin. Hu's (1988) paper on Beijing Mandarin (1987) for 38 example, correlates the retroflex/palatal series with sex and age.

2.2.1 Yeung's Study on Cantonese Though variation in Cantonese pronunciation has been noted before, Helen Yeung's M.A. thesis (1980) is the first

to make a systematic investigation of the phonological variation. Her work looks at six variables in Hong Kong

Cantonese: n-/i~, kw-/A--, ng-/0- (/?-/ in Yeung's representation), ts~/tsh~, -m/-p, and -ng/-k. The thesis consists of two sub-studies: the first considers only the first three variables and the second all six variables.

Informants were approached in parks and on a university campus and asked to identify key objects in photos whose

names involved the variables. Background information was obtained after completion of the interview. Age and sex were then investigated as factors affecting the production

of each variant. The first study divides the data into the following

sub-groups with 20 subjects evenly distributed as follows:

Age 20-30 45-55 Male 5 5 Female 5 5

Note that Yeung here has no informants between 30 and 45, a potentially very interesting age group, especially if one proposes a change in progress as she does. This middle 39 age group might well exhibit the greatest variation as a potential transition group. Yeung reports both age and sex to be factors in use of the three variables considered in her first study. Specifically she reports that all younger speakers have •switched to J- ' (p.7-8), as have most of the older males.

She also reports that age plays a role in the production the kw-/k- variable, the older group tending toward kw-.

With the velar nasal/zero alternation Yeung again found gender to be the most important factor affecting variable choice. She found that females of both age group tended to preserve the historical ng- and to change historical zero (or glottal) to velar nasal ng-. In Yeung's second study the two age groups are expanded to five and the age widened to include speakers as young as

12 years old. In addition, a story was used for the picture prompts of the first study to elicit tokens. Furthermore three additional variables were added to the three of the first study. The five age groups divided by sex then are as follows, each again represented by five informants:

Age 12—16 20—27 30-45 40—46 50-57 Male 5 5 5 5 5 Female 5 5 5 5 5

As in the first study it is reported that males have a higher frequency of 1- and so too it is observed that the 40 younger the speaker— both male and female— the higher the frequency of the J- initial.

As in the first study also it is reported that males of all age groups drop the historical ng- more often than females. The younger male speakers also tend to preserve historical zero initial where the older speakers of both sexes tend to substitute ng-. Thus males are reported to have a higher total production of 0-. Both studies agree on most points. First, the younger the speaker the more innovative is his production— that is, the more he deviates from the historical form. Male speakers were reported to be generally more innovative as well. The exception to this tendency is the female speakers’ propensity to use the \ng-l in form in place of historical 0-. An hypothesis that was not supported in Yeung's results was place of origin or ethnicity as a factor in variation. Following T'sou (1976), Yeung divided the her informants by five ethnic groups according to self indentification:

Basic Cantonese (= Yuan's Coastal Yue), Siyi, Kejia, Chaozhou (Min), and Out-of-Stater (=all other Chinese dialects). She found almost no evidence of ethnicity or language background effecting choice of variant (p.32).

There are several important questions concerning Yeung's studies that need to be raised. First, the number of informants in the first study is quite small (2 0 ) and thus the statistical validity of the results are questionable. Secondly, tokens were elicited in only one 41 situational context in each study (picture prompts in the first and story telling in the second). The interaction of age, sex and other social variables with style and context is not considered in Yeung's reports. In addition, the claim that 1- is lost is probably premature and unsubstantiated by the data as the variable may be affected by contextual factors. The statistical validity of the study's claims are examined in chapter 6 .

2.2.2 Pan's Study of Prestige Forms in Hong Kong Another M.A. thesis on phonological variation in Cantonese is Pan (1981). In the first part of his study. Pan considers social factors that affect usage of several phonological variables, including two that are discussed here: n-/l~ and ng-/0~. In addition, the variable k"-/k- is considered in his analysis. In the second stage. Pan tests the hypothesis that the n- variant is the prestige form for the first variable. For the second variable his hypothesis is that for the historical initial *ng-, [ng-] is the prestige form. Likewise, for the *0- initial, he posits the [*0-j as the prestige form, while acknowledging the potential ambiguities. Essentially, then he posits the historical forms as prestige forms. He tests two groups. The first (Group I) is made up of seven university students of both sexes and has a mean age of 20.6 years (range of 19.6 to 20.7). The second group (II) is comprised of six white collar workers also of both 42 sexes with a mean age of 29.6 years (range of 27.6 to 30.7).

In studying social factors affecting speech Pan closely follows Labov's framework. Sex and age along with contextual style are considered as factors affecting production of variant forms of the three variables. Pan considers three contextual styles similar to Labov's. In order of increased formality they are: Casual Speech (OS),

Reading Passage Style (RPS), and Word List (WL). CS was collected by recording speakers under the guise of a group discussion and therefore cannot be called casual in the

sense of any of Labov's five sub-contexts Aj_j (1972b: 87- 94). Moreover, the speech data is more formal than the running, spontaneous speech of Milroy's informants. The second stage of Pan's study involved setting up a modified matched guise procedure. The informants listened to two versions of the same recorded story from two separate

cassettes. The two passages were read by the same speaker only differing in that one was read using all the hypothesized prestige forms (roughly following historical lines) {./n-/, /ng-/ and /k*'/) and the other using all non­ prestige forms (/J-/, /O-/ and /k-/). The subjects were then asked to determine which speaker had the higher socio­ economic status.

This use of the modified guise procedure is somewhat unusual. Normally the researcher produces tapes of two entirely different languages or at least clearly distinguishable dialects. In Pan's experiment, the level of 43 variation was low enough so that it was very possible that the subjects would not immediately and consciously assume the existence of two different speakers. To remedy this problem Pan told the listeners that the speakers had been especially chosen based on their similar voices. As Pan himself noted (1981:13), such instruction may have biased the subjects toward differentiating the two story versions where they may not have otherwise. He defends his method by arguing that the trade-off was necessary because without the additional instructions the subjects would likely assume one speaker and not be sensitive to the low level variation. As a check against the bias, he examined patterns of multiple exposures to see if they were regular, arguing that a haphazard pattern would indicate that the subjects were not really noting any difference. In addition, he gave the subjects the option of saying that they did not perceive any differences between the two versions. With all three variants taken as a group, the older group of speakers rated the version with the proposed conservative forms as more prestigious. If the groups are broken down by sex and age group. Group II females overwhelmingly (F=83% M=19.5%) rate the conservative version as more prestigious. The males on the other hand, showed a slightly higher tendency to rate the conservative form higher in Group I (F=43.5% M=53%). The results were not conclusive within age groups* . Pan found in his investigation of formality that the

frequency of production of his proposed prestige forms did 44 in fact increase as formality level increased. In the word- list portion of Pan's experiment he asked the subjects to first read the list to record production. Secondly he asked them to listen to a pre-recorded list of words, all with the potential of the variation noted above. In each case, the items were read twice— once using the conservative form and once with the innovative form. The subject was then asked to decide which form was- 'better' or more 'proper' by ticking the appropriate column on his or her list. In the case of the two variables of interest here (ng-

/O- and n-/l~). Pan reported the percentage of those reporting the first form as better was high in both groups. The percentages are as follows:

Table 6 Percent Reporting Conservative Form Texts as Better

Group I Group II

[ng] 76% 80% [n-1 59% 70%

Though the figures suggest a general agreement on prestige form, they are not conclusive— especially in the case of Group I's perception of [n-]. It should be reiterated here though that Pan's sample is generally educated but young. Group I's mean age is 20.6 and Group II's 29.6. Thus, a very large— and very important— part of the age spectrum is not accounted for. Though the somewhat high level of education might have biôised the results toward reporting more prestige, the lower age range may have biased the 45

results the other way. In addition. Pan pays no attention to the relative difference in occupation and the resulting difference in social class in his results. In terms of production. Pan reports both age and sex to be factors in frequency of conservative forms. The older

group (Group II) is reported to have produced the prestige

forms in higher frequency than the younger group (Group I). Pan's (1981:96) results are as below with totals added.

Table 7 Sex/Age Group Scores For Conservâtive-Form Production Across Three Contextual Styles

Group I Group II (younger) (older) Averages

Women .430 .613 .562 Men .316 .350 .333 Averages .373 .482

It is difficult to tell from the information provided in Pan's report whether there is a staLi#tically significant difference between sexes and age groups. Pan does raw count data for individuals or token numbers, but only mean decimal frequencies. Nevertheless, it is possible that age and

gender were indeed significant factors in Pan's study. Since the mean age of the two groups is so close, the lack of a significant difference as reflected by Pan's figures does not preclude age as a significant factor in the

speech community. Indeed, it will be argued later in this report that both age and sex are factors with respect to 46 report that both age and sex are factors with respect to these variables in my own study.

Among the most striking findings of Pan's study is his claim of significant hypercorrection toward n- of the prescribed (or historical) [J-] initial. Unlike the situation with the ng-/0- variable, hypercorrection toward /n-/ in words with historical (i-J has not been reported elsewhere (cf Yeung 1980) and most observers believe that this hypercorrection does not occur. Pan reports that all groups hypercorrected to some degree but that women and the older group hypercorrected most. The percentages are as follows (Pan n.d.zll):

Table 8 Percentage Hypercozrection Between Sex and Age Group (*1 > fn-J)

Group I Group II Combined Women 11.0 22.3 16.7 Men 14.7 33.5 24.1 Combined 12.9 27.9 20.4

It is important to note here however, that Pan's results are all from word lists and as such it is questionable whether this phenomenon is found in natural speech. Though Pan's results seem to confirm his claim that

the n-, ng- and k"* forms are seen as prestige variants, his weak control of important background factors in his analysis 47 make his results suspect. Moreover, the statistical validity of his results are questionable and cannot be tested by the data provided in the report.

2.2.3 Bauer’s Hong Kong Study

All three of Bauer's papers come out of research done for his dissertation on Hong Kong Cantonese entitled

Cantonese Sociolinguistic Patterns (1982). The study is close in design to Labov's model in almost every respect. Bauer uses sociolinguistic methodology to investigate sound change in progress. Since all three of the works stem from the same larger dissertation study, I will focus here on that larger study with some reference to the subsequent papers.

Bauer's study uses data from 75 subjects ranging from 15 to 75 years old broken down into four sub-groups as follows (p.50):

Group A (15-22): 25 (33%) Group B (23-30): 24 (32%) Group C (31-44): 14 (19%) Group D (45+): 12 (16%)

Besides age, he looks at sex (42 males and 33 females) and education level (13-16, 11-12 and 0-10) as factors in variation. He then correlates the three social variables with two linguistic variables: -ng/-m and kw-/k~. Bauer reports that age correlates with -m and k- for most subjects. He also reports a tendency for male speakers to have a somewhat higher frequency of the innovative form k- 48 than females (85% for males; 59% for females). The total percent difference between males and females with regard to the variable -ngZ-m is also not great (34% for males; 42% for females) and goes in the other direction. Below is a summary of Bauer's data concerning the variables -m/ng and kw-/k.

Percentage /-m/ (Bauer 1982:106) Age Groups

15-■22 23-■30 31--44 45+ Overall M F M F M F M 1F M F Context

Spontaneous 91 68 61 56 14 69 0 1 42 49 Story 89 72 47 50 0 67 2 0 35 47

Word Lists 77 30 21 21 0 66 3 0 25 29

Overall 86 57 43 42 5 67 1 0 34 42

Percentage /k-/ (Bauer 1982:194) Age Groups

15-■22 23-■30 31-■44 45+ Overall MFMFMFM F M F Context

Spontaneous 98 96 97 89 98 45 71 32 91 66

Story 87 67 87 75 100 38 62 42 84 56

Word Lists 83 63 74 77 94 46 70 39 80 56 Overall 89 75 86 80 97 43 68 38 85 59 49

Bauer reports a lexical dimension to his proposed changes in progress as well and claims that the mechanism for change to be lexical diffusion (Wang 1969). He suggests that the change begins with ngh 'five' for the variable - ng/-m and continues with a group including the words gwo

(kwo) 'to pass through', gwok (kwok) 'country', gwong (kwong) 'broad', gwok (kwok) 'a surname', gwong (kwong) 'furious, wild' and gwong (kwong) 'ore'.

All Bauer's data are from interviews. Following Shuy et al. (1968), Bauer recruited most of his informants through the school system via a school principal. The children, and where possible the parents of children, were then interviewed. The interview consisted of two parts: a spontaneous speech section and a controlled section. In the spontaneous section of the interview Bauer asks questions of his subjects that concern life in Hong Kong or life in school where appropriate. Nursery rhymes were also read by the subjects. Many questions asked in the interview are adapted from Labov (1966) Shuy et al.(1968) and Trudgill. (1974) The controlled part of the interview from which 'careful speech' samples were drawn was elicited with word lists and minimal pair lists.

Bauer's study is largely a study of historical sound change using the sociolinguistic framework of Labov to chart the change. Bauer's theoretical bias in this study is clearly toward working out apparent sound changes in progress by focusing on the environment of the proposed changes and by employing William Wang's theory of lexical 50 diffusion (1969). He does not go beyond descriptive statistics in analyzing his data, nor does he present unified analysis of his descriptive data. His statistical presentation is limited mostly to comments on sub-group patterns, leaving the important issue of variable interaction somewhat peripheral. As such the study's claims are not as strong as they might be otherwise. Nevertheless, Bauer's Hong Kong study provides a comprehensive investigation into the relationship of social factors and phonological variation in Cantonese, especially with regard to how that relationship concerns the theory of lexical diffusion. 51

NOTES TO CHAPTER TWO

1. Labov himself now calls into question the use of word lists or at least minimal pair word lists. In a presentation at the Second Northeastern Conference on Chinese Linguistics (NECCL2), he argued the that speakers who do not normally use a particular form may produce it by analogy when presented with minimal pairs.

2. The following chart summarizes how Pan's informants rated conservative forms for prestige. (Pan n.d.: 12- 13) :

Group I Group II Total

Women 43.5 83.0 63.3 Men 52.0 19.5 35.8 Total 47.8 51.3 CHAPTER III

Discussion o£ the Variables

3.0 Discussion of Variables In chapter two I briefly discussed some of the literature on Cantonese and English sociolinguistic variation in general terms. In this chapter, I outline what is known or supposed about the sociolinguistic variables n-

/1-, ng-/0- and k-/h- in Hong Kong Cantonese specifically and in other dialects spoken in the area around Hong Kong.

In addition, I discuss the independent variables considered in this study.

3.1 Sociolinguistic Variables 3.1.1 Cantonese Initials Since all three variables under consideration here involve syllable initial phonemes, I shall first describe the initial system of standard Cantonese. Table 9 below (see also Yuan 1983:180, Cheung 1972:1) represents the Yue (Guangzhou) initials. The Cantonese of Hong Kong differs in some ways from the speech of Guangzhou (see Lee 1983), but, crucially the differences do not affect the phonological system, and in particular not the initials under investigation. The general picture can be seen below. As a

52 53 typographical convenience, the velar nasal is represented in the chart by ng.

Table 9 Yue (Guangzhou) Initials

Labials P P' m f Apicals t t ' n 1 ts ts' s j Velars k k' ng Labiovelars kw kw' w Laryngeals (0 ) h

Variation has been noted in the Cantonese for some time (see Yuan 1983 among others). With the exception of the several studies noted in 2 .2 , however, this variation has only been noted and not studied in any depth.

Some speakers delabialize - initials to merge with initial k- and some pronounce the syllabic velar nasal ng as syllabic m (Yeung 1980, Bauer 1982). Many speakers attach a non-etymological ng- to all words beginning with a vowel (Yuan 1983: 181; Hashimoto 1972:89). Confusion of the n- and 1- initials has been reported in many Yue dialects (Zhan and Cheung 1987, 1988) as well as in many other dialects of Chinese (Beijing Daxue Hanyu Fangyan Cihui 4-17; Yuan 1983). In addition to the variables noted above some speakers of

Hong Kong Cantonese pronounce initial k- as [h] in the personal pronouns keuih 'he,she' and keuihdeih 'they'. The variables in the second column above, n-/l~, ng-/0- and k-/hf are the focus of this investigation and are discussed below in more detail. 54

3.1.2 Tonal Inventory of Standard Cantonese Chinese is a tonal language and for those not familiar with the tonal system of Cantonese this section is intended as a simple introduction. The tonal systems of Chinese dialects are conventionally represented according to how their modern tonal systems fit four historical categories and two registers. The four categories are even {ping), rising {shang), going (gu) and entering (ru). The two registers are upper iyin) and lower {yang). The tonal values for category and register have evolved differently for different dialects. Therefore, the names for the categories are no longer descriptively accurate for all dialects. For example, an even may not be even at all in its contour, but falling or rising in some dialects. For instance, in the low register and sometimes in the high register, the standard Cantonese even tone is actually falling in contour. The chart below shows the tonal system according to the four categories and two registers as described above along with their tonal values. Values are represented using the tonal letters of Y.R. Chao (1930) along with numerical values. The first number represents the starting point and the

second (or sometimes third), the ending point of each tone. 55

Table 10 Tonal Inventory of Standard Cantonese (Guangzhou and Hong Kong)

even rising going entering

55 upper 55 35 33 or 53 33 lower 21 13 22 22 ____

Though the tonal systems of Yue dialects vary widely, the

tonal system of Guangzhou- and the tonal system of Hong Kong

are essentially the same.

3.1.3 n-/l- Syllable Initial 3.1.3.1 n-/l- Variation in Chinese Dialects The dialect survey Hanyu Fangyan Cihui (1964:6-17) notes variation between /n/ and /I/ in the syllable initial

position for many Chinese dialects, including Southwestern

Mandarin (Chengdu, Kunming) Southeastern Mandarin (Hefei,

Yangzhou), Xiang (Changsha), Gan (Manchang) and Northern Min

(Fuzhou), and Yue (Guangzhou). Other sources report the

alternation in the Southern Min dialect of Chaozhou (Li 1959:7) and in Kejia (Sagart 1982:127). Thus, the n-/l-

variation appears to be a widespread phenomenon which affects many all subgroups except Northwest and Northern

Mandarin and Wu. 56

In most of the cases noted above the distinction between the two initials is reported as mostly intact with some limited mixture between them. However, it is reported that in the Hefei dialect *1- has totally merged with *n~.

In contrast, in Yangzhou *n- is reported to have merged completely with *1-. How representative these reports are is not clear from available sources. Moreover, we are not given much background information concerning the informants, nor concerning the circumstances under which the data was recorded. Nevertheless, it is clear from many accounts that n-/l- variation exists in a large number of dialects, especially those in the south.

In addition to the surveys, a number of studies have reported phonologically conditioned variation between n- and i- in Chinese dialects, especially in southern dialects. Chan (1987: section 4.3) discusses several of these cases in her paper on post-stopped nasals. For example, in some Hunan dialects the lateral is not produced with stop closure. Yang (1984) notes a tendency toward n- before palatal medials (Chinese medials qichi and cuohe) and 1- before non-palatals (i.e., Chinese medials kaikou and hekou). In some Southern Min dialects (Norman 1988:236) (e.g., Yonggan and Jiangle) n- and 1- are said to be in free variation. In other Southern Min dialects (e.g., Xiamen) n- and i- are said to be conditioned variants of the same phoneme (Chan 1987:; Norman 1988:236): n- before nasal vowels and i- before oral vowels. 57

In standard Cantonese the variants n-/l- are not conditioned by phonological environment or syllable structure. That is, most if not all words involving the *n- (e.g.f the pronouns neih 'you', nidi 'these', etc.) can be intermittently pronounced as either n- or J-, often in the speech of a single speaker. The two variants may be said to be in free variation in Hong Kong Cantonese, though I argue in chapter five that variation is in fact correlated to social and situational factors. If the realization of n- /1~ (or indeed the other two variables) is phonologically influenced by the surrounding segments it is not absolutely so. It is possible that a particular variant tends to occur more frequently in one phonological environment over another and that a variable rule could be written to reflect the probability of that variant occurring in a particular environment (a la Cedergren and Sankoff 1974). Although I do not discount environmental factors, a variable rule analysis is outside the main scope of this study.

3.1.3.2 Initials n-/l- in the Pearl River Delta The Pearl River Delta is the area nearest to and most closely associated with Hong Kong linguistically, economically, historically and culturally. Most of the dialects spoken in the delta area are of the Yue group but there are also prominent pockets of Kejia and Min. Zhan and Cheung's (1987:8-30) survey of the dialects of the Pearl delta gives data for twenty-five Yue dialects, five Kejia dialects and one Min dialect. They report that most of 58 these dialects maintain a distinction between the two initials /n/ and /I/, while a few are reported to have completely merged /n-/ with /I-/. The dialects of Macau,

Panyu, Shunde, , , Conghua (Lutian) are reported to lack an /n/ initial. (see figure 5) Zhan and Cheung describe the Dongguan dialect as basically having no initial /I/, but report that some speakers (presumably the minority) maintain the historical *1-. The dialects described for Macau, Panyu and Shunde are of the Yue group while the other three belong to the Kejia group. All but

Conghua are situated close to Hong Kong and some have argued that they influence Hong Kong speech (or vice versa)(cf. Hashimoto (1972:120) Yet there are quite a number of other dialects that are at least as close both spatially and culturally that are not reported to mix the /n-/ and /I-/ (notably Guangzhou).

As with Yuan's work, this survey is a report of one 'ideal' informant reading a long list of characters. Consequently, one would not expect to capture a wide range of variation. Indeed, the ideal situation is to find no variation. Unlike Yuan's survey Zhan and Cheung make almost 59

Lutian

Conghua # Huaxian

Zcngchcng Shanshui • Guangzhou

oshan • $ ( lluizhou 9 # Dongguan Nanhai Gaoming Shunde Heshan # Gao*an Jiangmen 9 Shenzhen Xinhui* , \ • Zhongshnn Longdu ^ Kaiping I Shanghcng/'*^ Nanlang\^^^ Bnping * » Hong J Taishun Kong I Doumcn ^ Macau o

Key

*ng-/*0- distinct # ng- only A

0- only ■

*ng-/*0- mixed 3

Figure 5 Dialect Map of the Pearl River Delta: n-/l- Distribution 60 no note of variation. Therefore, no information concerning any socially stratified variation can be found there. We can say only that there are dialects in very close proximity to Hong Kong that sometimes do not maintain the historical distinction between /n/ and /I/ initials. The question of how language backgrounds of the informants of the present study correlate with their usage of the variable is taken up later in the discussion of the results in chapter five.

3.1.3.3 Initials n-/l- in Hong Kong Cantonese Virtually all of the syllabaries (e.g., Wong 1987) and dictionaries of Cantonese maintain the distinction between the historical *n- and *1-, Zhan and Cheung’s (1987:8) recent account does not note any loss of the *n-/*l- distinction for Hong Kong Cantonese. Yet it is obvious to even the casual observer of the language that the distinction is often not made in actual speech of many Hong Kong Cantonese. Samuel Cheung (1972:1-2) earlier affirmed the n-/l- variation in Hong Kong speech, also noting that unlike some reports of the speech of Guangzhou, the confusion of the two initials in Hong Kong is almost always toward i-. As discussed in the previous chapter, both Yeung (1980) and Pan

(1981) report the 1- for n- substitution to be widespread among many sub-groups in Hong Kong, especially young people and males. Also, both report the [n-] variant to correlate strongly with increased formality in style. 61

It is widely believed now that the initial /n/ has been entirely lost in the speech of Hong Kong Cantonese speakers under about the age of thirty. That many Chinese learners

of English as a second language in Hong Kong apparently cannot differentiate between [n] and [1 ] in the initial position is often attributed to the loss of the /n-/ phoneme

in their own language (Cheung 1972.;l-2; Hashimotc 1972:120). The variable n-/l- is perhaps the most often discussed

variable among language specialists in Hong Kong- Although the n-/l- distinction is unconscious to many of those in

whose speech it most often occurs, it is quite noticeable to

many non-specialists, especially to those who maintain a distinction between the /n/ and /I/ initials. There are a couple of misconceptions concerning this

variable. One misconception is that it is a very recent phenomenon; the other is that the /n/ initial has been completely lost in the speech of those under about thirty. Neither notion is quite correct. More than forty years ago Y.R. Chao claimed that about one fourth of the speakers in Canton have no initial /n/ (Chao 1947). However impressionistic Chao's proportions, it is clear that many speakers of standard Cantonese (i.e., Guangzhou and Hong

Kong) have used /I-/ for /n-/ for quite some time— very likely long before Chao's note. Only a reluctance on the

part of most linguists to focus on variation instead of uniformity can explain the omission of more frequent descriptions. The heavy reliance of a single ideal informant, so important to the traditional descriptive 62 method, tells us little about the speech of the average— or the non-average— speaker of the time and place being described. Most descriptive reports in this century admit very little, if any, variation between the two /n/ and /I/ initials in Hong Kong. The two sociolinguistic reports that do exist on these variables (i.e., Yeung 1980 and Pan 1981) suggest that /n-/ is very nearly lost— or at least soon will be. The task here is to examine the above claims and to determine to what degree they hold true for various sub­ groups and different situational contexts.

3.1.4 Velar nasal and Zero Initials

3.1.4.1 Initials ng-/0- in the Pearl Delta Nearly all of the Chinese dialects in the Pearl Delta as described by Zhan and Cheung (1987) have both a velar nasal and a zero initial (see figure 6 ) and their distribution follows historical lines. Of the twenty-five Yue dialects and the six non-Yue dialects, only Panyu and Shunde are reported to lack completely the velar nasal. Jiangmen and Dongguan havc both initials but they are not distributed precisely according to their historical counterparts. Note the distribution according to a few sample characters in tables 11 and 12 as taken from Zhan and Cheung (1987). 63

Lutian

Conghua #

lUiaxian

Zcngchcng Shanshui Guangzhou

# I Huizhou oshan ▲ e Dongguan Nanhai Gaoming Shunde Heshan • Bao an Jiangmen ■ Shenzhen

Xinhui* ,Longdu ^ • * ^ / Kaiping Shanghcng/V \ Nanlang Enping • ) A \ \* '/huhai Hong ; Taishun . , Kong I Doumcn 1 \ Wucuu

d -q O“ a

r " - Key *n-/*l-distinct e

n- only ▲

1- only ■ *n-/*i- mixed 3

Figure 6 Dialect Map of the Pearl River Delta: Distribution of ng-/0- 64

Table 11 Realization of Historical *ng~ in Pearl River Dialects

'art' 'danger' 'outside' 'silver ' 'I, me' ngaih ngaih ngoih ngahn ngoh # % m #

Bao'an X X X X 0 Cengcheng X X X X X Conghua XXX X X Dongguan X X X X X Doumen X X X X X (Shangheng) Doumen XX X X X Enping X X X X X Foshan X X X X X Gaoming X X X X X Guangzhou X X X X X Heshan X X X X X Hong Kong XXX X X (city) Hong Kong X X X X X (NT) Huaxian X X X X X Jiangmen XX X X X Kaiping X X X X X Macau XXX X X Nanhai X X X X X Panyu 0 0 0 0 0 Sanshui X XX X Shunde 0 0 0 0 0 Taishan X X X X X Xinhui X X X X X Zhongshan X X X X X Zhuhai X X X X X

Key : X = realized as indicated not an entry in source 10] = zero initial prescribed N.T. = New Territories 65

Table 12 Realization of Historical *0- in Pearl River Dialects

'banquet' 'duck' 'house' 'darkness' 'love' 'evil' 'peace' 'pressure' 'district' an ap uk am oi ok on at au % % # $ m

Bao'an X X XXX XX X ng Cengcheng X X XX X XX X X Conghua XX XX X XX X X Dongguan ng ng ng ng X ng X ng ng Doumen XX XXXXXXX (Shangheng) Doumen XX X X XXXXX Enping X XXXX X X X X Gaoming XX X X XXXXX Guangzhou X XX X X X XXX Heshan X XXXXXX XX Hong Kong X XX X XXXXX (N.T.) Hong Kong X X X XXX XXX (City) Huaxian XX X X X X ng XX Huoshan X XXXXXX X X J iangmen X ng X X ng X ng ng X Kaiping X XXXXXX X X Macau XX XXX XX XX Nanhai X XXXXXX X X Panyu XX XXX X XXX Sanshuei X XXXXXX X X Shunde XX X X X X XXX Taishan X XXXX X X XX Xinhui XX X X X X XXX Zhongshan X X X X X X XXX Zhuhai X XX X X X X X X

Key: x = realized as indicated = not an entry in source [ng] = velar nasal prescribed N.T. = New Territories

As mentioned in 3.1.2.2 above, Zhan and Cheung's work does not focus on variation and therefore a good deal more variation may exist than is reported. Nevertheless, one can get a general idea of how the variable is distributed throughout the area to which Hong Kong belongs. 66

3.1.4.2 Initials ng-/0- in Hong Kong Cantonese

Unlike the n-/l- variable, the variable ng-/0- is a

fairly complex one in Hong Kong Cantonese and establishing a base form is not straightforward. A substantial number of Hong Kong speakers (and many other Standard Cantonese

speakers) clearly do not follow the prescribed forms given

in most syllabaries, which are for the most part

historically based. Again, unlike the n-/i-variable, the

variation between ng-/0- can go both directions. That is,

some speakers drop the historical /ng/ initial while others

add a non-etymological /ng/ to words beginning with a non-

high vowel (i.e., historical zero initial). Even

dictionaries and syllabaries do not always agree on the prescribed forms of the words involving the ng-/0- initial.

Table 13 below is an adaptation of Pan's list (1981:6-7)

with some minor modifications and additions from other

sources showing the prescribed form for a few sample words in Cantonese. 67

Table 13 Prescribed Realizations of Selected ng-/0- Lexis from Twelve Sources Prescribed [ng]

1 2 3 4 5 6 7 8 9 10 11 12

ngauh XX X X XXX X X X X ngaang # 3? X X XXX X XX X — — ngaih m XX X XX X X X X X X X ngaak Die X X X XX X XX X 0 - - ngaih XXXXXXX XX XX X ngaahn m X X XX X X XX XX X ngaam X X X - XX XXX X — — ngoih XX XX XXX X XX X X ngoh XX XX X X XX XXX - ngahn m XX XX X XX X XX XX ngohk % XXXXXXXX X X X - ngaaih m X X XX — XX X - - X ngaaih m XX XXXX X X X ng/0 X ngai X XXXXXX XXX - - ngahp D& XXX XXX -- - X -- ngoh XXXXXXX X X X XX 68

Table 13 (cont.) Prescribed I*P-] Initial

1 2 3 4 5 6 7 a 9 10 11 12

aai 3 # X XX X X X X ng X X aai m X XXXXX X X ng 0/ng x - aan X XXXXX X X ng XXX aap II X XXXXX X ng X XX - aat m X XXX XXX ng X X - - ai XX X X - X X X ng XX - ak @ XX XX - X X XXXX - am X X XX X X X X ng X XX uk m - X X X X X X X ng XX X ou m XX XX X X X XXXX - oi # XXXX XX X X ng X X X ok m XX XX X X X X ng XX X on $ XX XX XXX X ng XX X ung m - X XXXX X X ng X - -

Key: x = realized as Indicated = not an entry in source [0] = zero initial prescribed [ng] = velar nasal prescribed 69

The initials as prescribed in table 13 above generally reflect a distribution according to tone register.

Syllables beginning with zero initial (or alternatively with a glottal stop) are shown to occur mostly in the high (or yin) register and beginning with velar nasal /ng/ are shown as occurring mostly in the low (or yang) register.

The exception to the pattern is Huang's dictionary as shown in column 9, which lists all but two characters as having a velar nasal initial. Clearly most of the sources follow the historical pattern in the distribution of ng-/0~. Only

Huang deviates, and does so in favor of the velar nasal. A number of scholars have commented on the ng-/0- variable. Hashimoto (1972:141-145) offers an historical explanation for the variation that essentially reflects the forms as given in syllabaries. Yeung (1980) and Pan (1981) depart from the historical methodology to explain variation in terms of social background and speech situation. Both Yeung and Pan considered a speaker as innovative if he uses a form of the ng-/0- variable that departs from the prescribed norm as historically reflected in the syllabaries. That is, if a speaker uses a velar nasal /ng-/ in a word that historically possessed zero initial, then that speaker is considered to be innovative. Likewise, if a speaker produces a velar nasal initial in place of an historical zero initial, he would also be considered innovative. Using this approach they found young people. 70 males and the less educated, to be more innovative as a group.

In the present study variation from the historical norm will also be correlated with social and contextual variables. The frequency of 0-is considered separately for tokens with *ng- and *0- initials.

3.1.5 Initial k-/h- in Third Person Pronoun

3.1.5.1 Pronouns Only

As a variable, k-/h- is very different from the other two variables already discussed in that it only occurs in the two colloquial third-person personal pronouns keuih 'he, she' and keuihdeih 'they.' Even homophones segments such as keuih in the compound keuihjyat 'to reject' is not known to have a variant with initial /h/. Though the variable has been noted before (Bauer 1982:29-30), it has not been studied systematically.

Since there is no standard character for the pronoun keuih, it is not certain to which historical category it belongs. The homophone keuih in keuihjyat 'to reject' mentioned above belongs to the historical category (gun initial) reconstructed as *g-, which became /k'-/ or /k-/ in modern standard Cantonese depending on historical tone

category (Hashimoto 1972:623)^ .

There is a group of words in Cantonese belonging to the Ancient Chinese *k'~ {qi initial) that have developed into modern Cantonese /h-/. Examples of these words are /ho/ 71

'can,' /hau/ 'mouth,'and /hon/ 'to look at' among others

(see Hashimoto 1972: 642-643; Bauer 1982:30).

Similarly, *k'~ has descended as /h-/ in many other Yue dialects as well. Chan (1980:219-221) reports that in

Zhongshan the pronunciation of *k'~ follows literary/colloquial lines. Most words with /h/ descending from *k'~ are colloquial, while most words with the reflex

/k/ are literary. Some words from the '-class have two readings, the literary version with /k/ and the colloquial version with /h/.

Though to my knowledge there are no reports of modern

Yue dialects having /h/ as a reflex of *g-, the Southern Min dialect of Hainan is reported to have an /h/ as a reflex of both *k- and *g- (Yun 1987), reflecting a merger of the two historical initials.

Bauer suggests that the variation in keuih and keuihdeih may be a part of an ongoing /k'-/ to /h-/ change by an analogy to the ancient *k'~ initial group. Certainly his suggestion is plausible in light of evidence from other dialects, especially the Hainan case. It is possible that

Cantonese too is moving toward a general deplosivization of both the qi (**-) and the qu (*g-) initial classes (cf.

Hashimoto 1972:642-43).

3.1.5.2 Comparison to Other Chinese Pronouns

Since the variable k-/h- and the other two variables under study here involve personal pronouns, let us look at the personal pronoun forms in other Yue dialects and non-Yue 72 area dialects. Below is a chart (based on Norman 1988: 220,227,234 and Zhan and Cheung 1988:418-19) showing the personal pronouns of some representative dialects. Note that transcriptions are phonetic* .

Table 14 Pronoun Forms in the Pearl River Delta

'I, m e ' » you • 'he/she* Yue

Bao,an (w)uo lei k • ui Cengcheng goi nei k ' oey Conghua goi ji k ' oey Donguan go nai k ' ui Doumen(Shangheng) go lei k'i (k'ui) Doumen 9 guo “ dei k ' ui Enping 9 gua " di k ' ui Gaoming go ni ky Guangzhou go nei k ' oey Heshan go nai kui Hong Kong (NT) go nei k 'y Hong Kong (city) go nei k ' oey Huoshan go nei k ' oey Jiangmen go (nai) lei k • ui Kaiping " guoi “ dei k'ui Macau (=Aumen) go lei k ' oey Nanhai go nai ky Panyu oi lei k ' oey Huaxian gai nei k'ui Sanshuei goi nei k ' oey Shunde gi lei ky Taishan 9 goi " di k'ui Xinhui 9 go “ dei k'ui Zhongshan go ni k'ui Zhuhai go ni k 'y 73

Table 14 (cont.)

Keiia

Conghua (Lutian) gai(go) ni (li) t'a Donguan (Qingxi) gdi(qo) li k'i Huizhou (City) goi qi k 'y Meixian gai qi ki Hailu gai qi ki Huayang gai ni tçi Shenzhen(Shatoujue) gai li k'i Zhongshan gai Ji ki (Nanlang Heshui)

Western Min

Jianou u ni ky Jianyan gue noi ky Chongan quai nei hou Yongan quo qi qy Shaowu hag hien hu Jianle gai ne ky

Northeastern Min (Pearl River Delta)

Zhongshan (Longdu) ua,go ni i

Although a number of area dialects are reported to use the innovative forms [0-] and I1-] with pronouns, the forms in the above table give us no strong evidence for dialect influence regarding the third person pronoun. Where the forms are cognate, they mostly share the same initial as the prescribed form in Hong Kong Cantonese. Only the Western Wu dialects of Yonan and Shaowu show initial /h/ in the third person pronoun. Thus, there is no evidence of direct inter or intra dialect borrowing. 74

3.2 Independent Variables The primary focus of this study is on how social and contextual factors affect the sociolinguistic variables described above in 3.1. I also examine the effect of the lexicon on variant choice. In the remainder of this chapter, I discuss the relevance of these influences to sociolinguistic study generally, and to the present study specifically. Although a wide range of potential factors were coded, only the most salient of those factors will be analyzed in the following chapters. Three main types of independent variables are discussed below: social, stylistic and linguistic. The social variables include: gender (sex), age, place of origin (ethnicity), education level, parents' education, place of rearing, parents' place of birth, home language, and school language. Just one stylistic variable was considered (speech register) and one independent linguistic variable (word class). Though not a main focus of this investigation, note is made also of word class. Other linguistic factors such as phonological environment and syntactic environment are beyond the scope and resources of the present study, though I plan to examine them in more detail in a future study. As a check against sampling bias, the social group of the informants and the interviewer were considered as possible factors influencing the sociolinguistic variables.

I analyze six of the above factors in detail in the following chapters: sex age, education, place of origin. 75 speech register and lexical category. Other factors were checked for gross effects only.

3.2.1 Social and Speaker Variables

There are number of social variables that have become standard in most studies of sociolinguistic variation.

Among the most commonly studied variables are sex and age.

These two factors have become standard primarily because they have been shown to be significant factors in most studies where sociolinguistic variation has been found and because they are relevant to most theories of variation and language change.

When preparing the background questionnaire for this study I included a much wider range of questions than I planned to use in the quantitative analysis portion of the study. In addition to standard information such as sex and age, I collected information concerning each informant, including birthplace, place of origin, education, occupation, home language (dialect), school language

(dialect) and language education (see appendix for full background data sheet). Additional background data concerning many of the above background details were also noted for each informant's parents. Sex has proved to be a significant factor in language variation in virtually every case where variation has been observed. Most studies that report variation claim that women are the conservera of the language— that is, they tend toward the overtly prestigious form of the language. The 76 notion of females as conservers is not without counter­ evidence however (see Romaine 1978:155), and the results from the present study seem to contradict the claims that females tend toward conservative forms. Moreover, there is some evidence to suggest that women's role as conservers of the prestige may be more true in Western culture if it is true at all. For example, Walters (1989a; 1989b) reports that females tend to innovate away from the prestige norm in

Tunisian Arabic. Some (see Horvath 1985:64) have argued that sex is the most important speaker variable— even more important than social class, to which the concept has sometimes been tied.

One explanation for the difference between the speech of men and women is that women attempt to use language to mark status for themselves where occupational prestige is less attainable. There have, however, been a number of objections to this point of view (Cameron and Coates 1985; Coates 1986). Alternatively, Coates (1986:93-94) argues that the relatively less dense social networks of women are not as apt to enforce the vernacular norm. Again, though, this view of less dense women's social networks may be culturally bound. Certainly, there are cultures in which women have dense social networks (e.g., Japan). Thus we may have to look also to underlying aspects of sex related variation. If men behave differently than women linguistically, then we may ask what is it about their situations that may promote such a difference. 77

Age is also a widely studied speaker variable. One of the reasons for this is that it figures prominently in theories of language change. Age can be treated as either a continuous or a discrete, categorical variable. Most studies on language variation have treated it categorically, probably because it is easier to conceptualize and to analyze that way. Often the researcher draws the lines at ten year increments (0-10,11-20, 21-30, etc.).

Alternatively, the researcher may want to reflect certain natural categories such as school age, working age, retirement age, etc. The problem with a categorical treatment however, is that age is more subject to judgmental error than if it is treated continuously. For example, a researcher may inadvertently divide the age range in a way that might hide certain effects.

Another potential problem in drawing category boundaries is that they may not work across cultures. In the west it is reasonable to make a division at about the age of 18, the age at which most young people finish their secondary education and either begin work or a college education. It is clear however, that this division would not be so natural in many societies of the world,

specifically those societies whose education systems are

every different from those in the West. Since Hong Kong's

education system is based on a Western (mostly British)

model, it is reasonable to draw the category lines roughly similar to the way they would be drawn in the West. The age 78 at which people enter full-time employment and marry is also similar.

There are some differences however that could affect divisions. For example, young people tend to live longer with parents for cultural and economic reasons— often after marriage as well. Nevertheless, the lines drawn for most western based studies are mostly appropriate for Hong Kong society as well.

Though I coded my informants on a continuous basis, I chose to split my informants into four age groups for the purpose of the main statistical model: 1-18, 19-30, 31-45, and 46 and over. Like in many Western countries, eighteen is an age at which most young people either enter employment or higher education. Consequently, it is a natural social division for many people as well. Between the ages of nineteen and twenty people often pass through critical stages of life such as marriage and parenthood. The other two categories are more arbitrarily drawn and represent convenient divisions. However, because Hong Kong has experienced large scale immigration in the last forty or fifty years the over forty-five group also corresponds to the larger portion of those born outside of Hong Kong.

The concept of 'place of origin' is important in

Chinese culture. The concept does not imply foreign origin

in the same way as ethnicity might in North America or many

other places in the West. Indeed, being Chinese is a given. However, within the Chinese culture as a whole there has

always been a strong sense of identification with one's home 79 village. This is so even with families that have lived in a new country or another geographical region of China for many generations. Thus, many (pre-mid 1960's) in the U.S. are not just Chinese but Cantonese, or more specifically Taishanese. The concept of place of origin is in many ways similar to that of ethnicity, though it has much more to do with locale than cultural tradition. The many Cantonese villages and districts in the Pearl Delta area, of which Hong Kong is a part, have few significant cultural differences, yet people having origins in those various locales usually make clear specific identification to their home villages and/or district. Only part of this can be explained by dialect differences. Lacking any comprehensive study on ethnicity, we are constrained to define it exclusively in terms of language.

For this study I have grouped all my informants into five area groups with respect to their place of stated origin:

Guangzhou and area, Siyi (Seiyap) area, N.E. Pearl Delta,

Zhongshan and area and Other. Place of origin was not originally a main focus of my research plan and it is important to note here that I divided my informants after the collection of my data. Therefore I did not approach any kind of evenly stratified sample with respect to this variable. As such, I made my divisions partly for practical reasons. Most of my informants claimed their place of origin within what Yuan (1983) (see chapter 1 above) calls the Coastal Yue Area. However I did not have an adequate 80 number of informants from most villages and districts within the area to make a quantitative analysis workable.

Therefore, I use two of Yuan's divisions for this study. Coastal Yue and Siyi. However, I also divide the

Coastal Yue Area into two: Zhongshan on one hand and the other Coastal Yue villages on the other. As I mention in chapter one above, I believe there is some justification for keeping them separate given Zhongshan's relative physical, separation from the other Coastal Yue villages. The other two groups are not primarily Yue regions at all. Northeastern Delta is a mixed Yue and Min; The Other category is a catch all for those that do not fit elsewhere.

Below are my divisions and some of the locales found in my five groups are:

Guangzhou and Area: Guangzhou City, Hong Kong, Macau, Bao'an, Shunde, Panyu, Nanhai, Dongguan, Gaoming, Shanshui

Zhongshan and Area Zhongshan City and Tanzhou Siyi and Area: Taishan, Kaiping, Xinhui and Enping

N.E. Pearl Delta Huiyang, Huidong Other: Shanghai, Jiangsu, Qingyuan

As Milroy has pointed out (1987:97-101) the notion of social class is complex. It is often presented as a sort of composite variable including such things as wealth, power, prestige, and sometimes, but not always associated with education and occupation. Milroy suggests that social class 81 can be seen as a 'proxy' variable (1987:101) for other speaker variables.

Though social class has become a standard variable for many studies following Labov's model, it is a concept fraught with difficulties. As Milroy notes (1987:98-99), how social class is treated as in a linguistic concept has much to do with which theory of social class one accepts

(i.e., Marxist or Functionalist (capitalist)). While

Marxist theory divides people according to means of production and as part of a historical process. Functionalist theorists see class as arbitrary and unfixed.

Thus, which theory the investigator subscribes affects has to do with how he designs his study and how the results of that study are interpreted.

Virtually all variation work has been done in the West and so the Functionalist model of social structure has become the standard. Nevertheless, the Marxist model is potentially relevant to Chinese data in view of the Marxist form of government on mainland China. Whether or not one accepts the Marxist theory of social evolution, Marxist notions of class have become part of the culture of the people who live in mainland China, though many traditional values and attitudes remain. Hence, the 'work unit' for

instance, is likely to have a different social connotation in China than in non-Marxist countries.

In contrast, Hong Kong has an avowedly capitalistic economic system. As such, the Functional model of social class might be more appropriate for Hong Kong. But even 82 within the Functionalist model it is not always clear which indicators are best for determining social class.

Another practical difficulty comes in assigning social class. For example, Milroy (1987:102) notes that in many stratification studies, women are assigned social class somewhat arbitrarily, sometimes according to that of their husbands or fathers and sometimes according to their own occupations (when they have outside employment).

Clearly, the notion of social class is problematic and a more fuzzy notion than the related notion of social prestige. In this study I have opted to use other indicators of social status, including education and occupation. These two factors are often taken as components of social class in the Functionalist model and therefore may be of overlapping theoretical interest. Considering the education and occupation separately is a simpler yet more workable approach than social class. Care needs to be taken however, to note the relationship between these and all speaker variables.

In summary, the social variables considered for this study include: sex age, place of origin (ethnicity), education level, parents' education, place of rearing, parents' place of birth, home language, and school language. Primary focus is given to sex, age, place of origin and education. 83

3.2.2 Contextual Variables In addition to speaker variables of the type discussed above we will consider speech register as a variable. Labov and others following his basic framework have relied on the interview as the basic tool to collect speech data. The interview is then divided into casual and formal, casual being that part of the interview where the constraints of the interview are overridden. The shift from casual to formal or vice versa is marked by what Labov has called 'channel clues' (1972b:94). These include changes in speech rate, loudness, and/or breathing.

Along with the casual and formal styles of his interviews Labov elicited other data by having his informants read material of what he suggested to be increasing levels of formality. These three styles of text were: narrative, word list, minimal pair word list. These he orders along with conversation styles to form a continuum from least formal to most formal, according to how much attention a speaker pays to his speech.

With respect to stylistic variables, this study departs from the Labov model in several ways. I consider three speech registers: impromptu conversation, interview, and public speech. Unlike Labov, I made a point to separate the interview and casual conversation in the collection process

(see chapter 4 for details). A conscious effort was made to keep the interview situation unambiguously an interview situation (cf Wolfson 1982). The data for the most formal register was collected by recording public speeches. 84

My purpose in choosing the three situational contexts above is that they are all naturally occurring and distinct speech situations. Unfortunately, there have yet to be any formal studies carried out concerning folk categorization of speech types in Chinese. Nevertheless, it is reasonable to assume that people perceive a public speech, an interview and a casual conversation to be distinctly different speech situations. Moreover, if people were asked to list different kinds of speech situations, the three situations under consideration here would likely be among those listed.

The assertion of three natural speech categories does not imply that there is no overlap in the kind of speech used in the three situations, or that they are the only potentially interesting situations. Within any category there are clearly many types of speech events. A public speech for example often has digression, short narratives, occasional quotations, etc. As Labov (1972b:87-99) has discussed, within the context of an interview many variations can be noted. Likewise, any stretch of speech might be further sub-divided.

In addition to their naturalness, all three categories have to do with spoken language and thus avoid potential complications with issues of reading behavior and literacy.

There are both theoretical and practical objections against reading styles. In spite of his emphasis on the vernacular as a preferred object of study, Labov nevertheless relies heavily on the written form to obtain his data. Three of

Labov's contexts are reading contexts, albeit in varying 85 degrees of formality. Reading aloud from a text is an uncommon activity for most people; reading from word lists is not done outside of experimental work in linguistics and related fields. Thus if we are interested in how people really use spoken language, then reading data is not well suited to our purpose.

Another problem with reading data is that it excludes illiterate informants. The issue of literacy is especially important in Chinese societies where there is a significant percentage of illiterate people. Though the literary tradition is great in China, many people— especially rural people— are completely or partially illiterate. This is notably so of the older generation. Thus by choosing reading styles one inadvertently biases and narrows the sample.

3.2.3 Linguistic Variables

In addition to social and contextual variables, quantitative studies of variation often take into account the tendency of a certain linguistic environment to affect the choice of a given variant. This tendency is often expressed in terms of a probability that a rule will apply

in a given environment. The best-known framework for the

discussion of rule probability is Cedergren and Sankoff's

Variable Rules program (1974). These environments can be

phonological, syntactic or lexical.

Though potentially interesting, a treatment of

phonological and syntactic environment is beyond the scope 86 of this study. The linguistic context of each tokens is nevertheless examined for gross effects.

Much of what has been done in recent years on variation in Chinese has been aimed at testing assumptions concerning the theory of lexical diffusion (Wang 1969). A number of studies on Chinese have presented evidence to support Wang's original claim that sound change can occur abruptly— morpheme by morpheme— by moving through different classes of the lexicon (see and Hsieh 1971; Chen and Wang 1975; Hsieh 1972; Wang and Cheng 1970). Whether or not lexical factors can be said to support the theory of lexical diffusion, lexical variation has been reported in a number of studies. Sometimes lexical variation is a matter of parallel competing systems (e.g. I pronoun systems (Kerswill 1984); tense systems (Milroy

1987:131-32) . Other times the variation seems to be a result of apparently idiosyncratic behavior within a word class (Fasold 1978; Neu 1980). There is yet another practical reason to pay attention to the lexicon. If a particular word or class of words behaves in a radically different way than others, the data may be subject to distortion. This is especially true if the word or class in question occurs frequently in the data.

Encountering this problem Labov (1980:xvi) excluded the word

and from his study of final stop deletion because it showed

a high frequency of deletion. In the analysis of the Cantonese data in this study, tokens occurring in pronouns and demonstrative are analyzed both together and separately. 87 because of their apparently idiosyncratic behavior and because of their high frequency of occurrence. The sum of the tokens for all speakers are also analyzed as a group to check for the effects of word class on production of variants.

3.2.4 Other Factors

There are a number of other factors that were briefly examined to see if they might have distorted the data in any way. The data were coded for interviewer and for social group (church, work, family, other) to see if there were any gross influences on production of each variant. For example, the interview data was recorded by both myself

(non-Chinese, male, 31 years old) and by an assistant

(Chinese, female, 27 years old). Our personal backgrounds very likely could have had an effect on what sort of speech the informants used when interacting with us. These personal attributes no doubt have an effect on the psycho­ social space noted above. Although coding for interviewer and social group cannot address the complexity of variables such as social class and psycho-social space, they can signal potential problems where the variation is great. Table 15 below summarizes the variables coded for in this study. Those independent variables marked with an asterisk are the main focus of the following chapters. 88

Table 15 Summary of Independent and Dependent Variables

Independent Dependent

Social

*Sex ng-/0- (*ng,*0-) *Age n/1 (*n-) *Place of Origin k/h (in 3rd person pronouns) ^Educational Level Parents' Educational Level Place of Rearing Parents' Place of Birth Home Language School Language Social Group

Contextual

*Speech Register

Linguistic

^Lexical Category

Other

Interviewer 89

NOTES TO CHAPTER THREE

Sources for the lexis in table 3.3 are taken from the following dictionaries and syllabaries:

1 Williams 1856 2 Chalmers 1907 3 Aubazac 1917 4 Cowles 1965 5 Wells 1931 6 Meyer-Wempe 1935 7 S.L. Wong 1938 8 Macau Government 1962 9 Huang 1970 10 Lau 1977 11 She 1982 12 Zhan and Cheung 1988

Some of the above sources are not available to me for verification (1,2,3,5, and 8). Where they are not available, I am accepting Pan’s notes directly.

2. The modern Cantonese reflex of the ancient gup initial in the colloquial layer is [k'] in ping and shang tones and [k] in the qu and ru tones. In the literary layer the gun initial is [k'l in the ping tone and [kl elsewhere. (Hashimoto 1972: 623).

3. I have made minor changes in Zhan and Cheung's transcriptions for typographical reasons. The primary difference is my use of o for Zhan and Cheung's o. In addition, there are some discrepancies between Zhan and Cheung's syllabary volume (vol. 1) and the lexicon volume (vol. 2). Sometimes the differences are crucial to this study. For example, the Sanshui form for the second person pronoun neih is given as /lei/ in volume 1 and as /nei/ in volume two. Since volume 2 gives more attention to colloquial forms, the representations in table 14 basically follow volume 2. CHAPTER IV

METHODOLOGY AND RESEARCH DESIGN

4.0 Methodology and Research Design The methodology of sociolinguistic research is rightly of primary concern to the field-worker. The context and conditions under which the data are collected is of special importance in sociolinguistic research since it is precisely the interaction of society and language that concerns the investigator. Unlike the work in general linguistics, methodology is primary and can have a major effect on the outcome of the study. There is now a substantial literature on quantitative sociolinguists and its methodology. For a complete discussion of sociolinguistic methodology, the reader is referred to Milroy (1987). This section outlines some of the key technical issues surrounding the design and implementation of the present study.

4.1 Sampling Design

4.1.1 Random and Judgmental Sampling

The ideal of a true random sample has rarely been achieved in sociolinguistic research. Labov attempted a random sample in his (1966) study but

90 91 encountered a number of problems in the process. His sample was itself a sub-sample of a larger sample originally done for a sociological research project in New York City. Of a sample of 340 original cases, only 88 were ultimately used.

The others were rejected for a variety of reasons including death, illness, non-local origin, or refusal to co-operate.

There are a number of reasons why random sampling is not practical in most sociolinguistic studies. Seldom are there adequate resources available to carry out the time consuming and expensive task of achieving a true random sample. Because of inherent problems in using random sampling in linguistic fieldwork, most researchers have adopted what is usually called a judgmental sample. Milroy

(1987:26) defines a judgmental sample as follows:

The principle underlying a judgement sampling is that the researcher identifies in advance the types of speakers to be studied and then seeks out a quota of speakers who fit the specified categories. A good judgement saVnple needs to be based on some kind of defensible theoretical framework; in other words, the researcher needs to be able to demonstrate that his or her judgement is rational and well-motivated.

The following sections discuss the composition of and motivation behind the selection of the sample used for this study.

4.1.2 Sample Size

Large samples of the type often used for sociological surveys are often not practical for linguistic studies.

Moreover, as Sankoff (1980:52) notes, large samples are not 92 as crucial to linguistic studies as they are to many other kinds of studies:

The literature, as well as our own experience, would suggest that even for quite complex communities samples of more than about 150 individuals tend to be redundant, bringing increasing data-handling problems with diminishing returns. It is crucial, however, that the sample be well chosen, and representative of all social subsections about which one wishes to generalise.

Milroy (1987:21) suggests that the reason why larger samples are not as critical as in other studies is that linguistic behavior is apparently more homogenous than many other types

of behavior studied by surveys. The present study employs three data bases, each with a different focus. The main data base contains data for all

three variables (ng-/0~, n-/l- and k-/h~) for 82 cases^ and

studies the effects of the background variables discussed in

the previous Chapter, for example sex age, and education.

The second data base is a subset of the first data base and consists of 14 informants, each recorded in three speech

registers: impromptu, interview and public speech. Only the

variable of register is analyzed in this sub-study, though background information is available for all informants.

Consequentially, fewer cases are needed to get an accurate analysis of register as a factor. These first two data

bases and resulting analyses are separated for practical

reasons. The process of obtaining data at three speech registers is difficult and time-consuming. Therefore, to 93 obtain all three registers for all informants of the main data base is beyond the resources of the present study. The 82 cases of the main data base is within the range suggested by Sankoff for the most complex studies and as such should be adequate for the purposes of this study.

Moreover, the 14 cases are sufficient to examine the one variable of speech register.

Table 16 below shows the distribution of the informants according to two of the key background factors.

Table 16 Distribution of Informants in Main Data Base by Sex and Age Group.

ROWS: sex COLUMNS: age category

6-18 19-30 31-45 46 + ALL

Males 5 9 28 7 49 Females 7 13 9 4 33 ALL 12 22 37 11 82

The third data base was created post-hoc from the main data base by combining the tokens of all informants. In the early stages of my research it appeared likely that certain words were more prone to take one form over another.

However, splitting the individual data for each informant by word class often left me with too few tokens (see the following section for discussion on token numbers). I decided then to combine all tokens into a single file to examine their distribution according to word class. This 94 procedure left me with a large number of tokens of the three variables: *ng- tokens = 2711, *0- = 366 , *n- = 2279.

Since preliminary checks of my data found virtually no *1- >

[n-1, I did no analysis of *1- tokens. Moreover, because word class is not an issue for the variable k-/h-, no master token list was compiled for that variable.

Strictly speaking, post-hoc design is not preferred. However, the early patterns were suggestive enough that exploratory analysis is justifiable here.

In summary, the three data bases and their contents can be outlined as follows:

Data Base 1 (Main Data Base)

1. dependent variables: ng-/0-, n-/l- and k-/h- 2 . independent (background variables): sex, age, place of origin, education, etc. 3. 82 cases (49 informants)

Data Base 2 (Subset of Main Data Base)

1. dependent variables: ng-/0-, n-/l- and k-/h- 2. independent (background variables): focus onregister, sex, age, place of origin, education, etc. also noted 3. 14 cases (14 Informants), all recorded at three levels: impromptu, interview and public speaking

Data Base 3

1 . dependent variables: ng-/0- and n-/l- 2. independent (background variables): word class only 3. data from all 82 cases of data base 1 combined by word class 95 4.1.3 Token Numbers In non-experimental studies of natural data such as this one, we encounter problems concerning varying lengths of speech samples and the related problem of varying numbers of tokens. Even if we were to standardize the length of speech segments (not necessarily a desirable goal), the number of tokens of a given variable are not likely to be equal. The problem of unequal token numbers is inherent in working with natural speech data. The most straightforward way of dealing with this problem is to convert the outcome for each case to a proportion. For example, if a stretch of speech by a individual speaker contains 10 tokens of the variable of n-/l~, 6 of which are realized as [1] and 4 as

[n] then the /I-/ frequency for that particular speaker would be .60 (or alternatively 60%). Converting a raw frequency to a proportional score allows us to put all data

in comparable terms.

The problem comes when considering small numbers of tokens. For example, a frequency of .66 obtained by 1 of 3 tokens would not be credible. But how many tokens can be said to be a reliable reflection of a speaker's performance? The topic of minimum token number has been discussed by Guy (1980) and Romaine (1980: 190-3). Guy points out that

N=30 is an important dividing line in statistics and that 30

should be the minimum token target. Anything below 10 moves

toward a high probability of random fluctuation. If the total N is to be subdivided then the number of tokens needs to increase in proportion to the number of sub-divisions. 96

For example, if we divide a segment by part of speech or by speech register (as we have here), then the number of tokens

for each register or part of speech should be as close as possible to the N=30 mark.

Romaine (1980) suggests that the researcher pay

attention to the type of variable being studied and the sensitivity of variables to size is not constant. Sometimes a frequently occurring word or word-class can affect the

number of tokens needed, particularly when a word exhibits

idiosyncratic behavior. This problem becomes crucial in

the present study where all three variables involve pronouns

and the n-/l- variable involves demonstrative as well— both high frequency classes.

In this study I accept Guy's N=30 guideline. As a

check against distortion from high frequency classes of pronouns and demonstratives, scores were calculated

separately in the event that they behaved significantly

differently from that the other classes.

4.1.4 Suggested Length of Speech Segments

The length of the speech segment to be used in the

analysis of sociolinguistic variables depends on many

factors. However, the desirable length is to a large extent

determined by the accessibility of tokens of the variable being studied. Useful data with an appropriate number of tokens for frequently occurring phonological variables might

be obtained in a 30 minute stretch of speech. On the other 97 hand, some variables (e.g., syntactic variables ) may take many hours to get even a very small number of tokens.

If one of the factors being studied is speech style or register, then one must obtain a suitable amount of data in all three styles. Yet another matter comes into play here.

If access to vernacular through the formal interview is an issue (as it has been for most studies in Labov's framework), then it is desirable to allow more time to access the vernacular by overcoming constraints of the interview. Since for this study I chose to obtain vernacular (or more precisely, impromptu speech) outside the interview framework, it was not necessary to lengthen the interview portion of the data per se. However, I did need to spend additional time in the presence of informants to obtain impromptu speech data.

The interview portion of this study was approximately 30 minutes in length, though circumstances sometimes allowed for a longer session. The length of the impromptu speech recordings were 20 minutes or more, depending greatly on the circumstances. Lastly, the public speeches were all very close to 20 minutes long.

Since all three of the Cantonese variables occur frequently in all kinds of speech, recording time was generally not a problem. I was usually able to obtain an adequate number of tokens within the time limits noted above. When I failed to obtain a minimum of 10 tokens, I entered a missing value code and the cell was omitted in the analysis. 98

The crucial consideration, then, is for the researcher to determine in advance how much recorded speech is needed

to obtain an adequate number of tokens of each variable to

be studied, while keeping in mind basic statistical guidelines.

4.2 Composition of the Sample

Most of the informants recorded for this study were

people with whom my assistant or myself had a social or professional relationship. Below is an outline of the

groups from which my data is drawn.

4.2.1 Main Sample: Church Groups

The largest part of the sample for this study was drawn

from several local congregations of the Church of Jesus

Christ of Latter-Day Saints, but mostly from the

congregation that my family and I attended during the course

of my fieldwork there. Because of a previous year and a half stay in Hong Kong as a church volunteer I was

acquainted with many of the members of the congregation

before returning for my fieldwork. Shortly after arriving,

I was asked to work in a volunteer position in the church organization, as is normal practice in the church in Hong

Kong and elsewhere. Church associations made up the largest

part of my close associations among the Chinese community;

they were also the most accessible and willing to participate in my study. Working with acquaintances allowed 99 for access to a much wider range of speech than would ordinarily be possible.

The congregation that I attended was fairly typical of other congregations in Hong Kong. The congregation consisted of 289 members, of which about half were regular attenders. All members were Chinese (excluding my own family). The congregation's characteristics were roughly similar to Hong Kong as a whole. The sex ratio of the congregation was identical to that of Hong Kong as a whole and the congregation ranged in age form new born to 81 years of age with a mean of 27 years. See tables 17 and 18 below.

Table 17 Distribution of Congregation by Sex and Age Group.

ROWS: sex COLUMNS: age group

6-18 19-30 31-45 46 + ALL

Male 30 77 29 20 156 10.38 26.64 10.03 6.92 53.98

Female 35 63 28 71 33 12.11 21.80 9.69 2.42 46.02

ALL 65 140 57 27 289 22.49 48.44 19.72 9.34 100.00 CELL CONTENTS — COUNT % OF TBL 100

Table 18 Distribution of Hong Kong Population by Sex and Age Group (Adapted from 1981 Census, Basic Tables : 6-7.)

ROWS: sex COLUMNS: Age Category

0-19 20-29 30-44 45+ ALL

Males 19.4 11.8 10.2 12.7 54.08

Females 14.5 10.4 8.1 13.0 46.02

ALL 33.99 22.14 18.32 25.6 100.00

CELL CONTENTS — % OF Table

Occupations ranged from street vendors to professionals and years of formal education from zero to twenty-two. In short, the characteristics of the congregation are not grossly different from Hong Kong as a whole. I do not claim this group to be strictly representative of the Hong

Kong speech community. However, when combined with supplemental data, it provides a reasonable sense of the larger community. In my judgement there is no indication that the church congregation differs in any important way from Hong Kong society as a whole. Nevertheless, as a check against gross differences resulting from social group, I compared the data from the church groups with data from the other informants.

4.2.2 Academic Associations

Another group with whom I had regular and prolonged contact was the academic community. My association with the 101

Chinese University of Hong Kong's Anthropology Department and my participation in activities there brought me into everyday contact with both faculty and students, many of whom agreed to serve as informants. Although the university community is unusual in terms of education and social background, it served to supplement the main sampling group above.

4.2.3 Additional Data In addition to church and academic contacts, I obtained small amounts of recorded data from several other sources including neighbors and casual contacts. A retirement home provided yet another source. These data are included in the main data base for quantitative analysis.

I also obtained data from television and radio. These data are intended to serve as supplementary data and not as part of the data base. To get a sense of how the variables were used in the recent past, I recorded old Cantonese movies, mostly from the 1940's and early 1950's. To get an idea of current standard pronunciation, I recorded television and radio news readers. Lastly, I recorded radio talk shows and call-in shows of various types. Many of the talk shows consisted of groups of 3-6 in their late twenties or early thirties discussing current and often controversial topics. The format of the show along with the group effect often produced casual speech. However, background information on the participants was not available and so I did not include these data in my formal data base. 102

4.3 Investigator's Role

4.3.1 Participant-Observer One of the most difficult and daunting tasks for a sociolinguistic researcher is gaining access to a community to the degree that he or she is able to observe and record an adequately wide range of language. The usual method is to attempt a face-to-face interview with as large a group as resources and time allow by using the kind of judgmental sampling technique outlined in 4.1 above. Although the

interview technique is an efficient method of collecting data, the technique also has the potential of introducing

outside influences into the situation. Thus the

investigators's effect on the data has become a major issue

in nearly all sociolinguistic studies. One example is the

notion of the 'observer's paradox' as first raised by Labov,

which describes the tendency for the observer to affect the

speech he is observing. Labov's response to the problem was to find ways to overcome the observer's paradox, usually by

encouraging passionate expression to overcome self­ monitoring and encourage use of the kind of vernacular that

the speaker presumably would use when the investigator was not present. Another technique is to gather data in groups

where peer pressure tends to overcome the propensity to

accommodate to the outside investigator's presence.

Yet another approach to the problem of the observer's paradox is to become a participant-observer (cf. Milroy 1985). The participant-observer approach makes the 103 researcher an insider to some degree and therefore is less likely to have an unnatural influence on the speech of those being studied.

Another important advantage of the participant-observer approach is that the researcher has access to a wide range of natural speech situations. Where Labov's approach was to overcome the formal constraint of the interview situation, the participant has much less need to manipulate since he or she is a part of the context. Beyond the quality and range of data collected, the participant-observation method allows the researcher much more opportunity to observe the context of linguistic behavior and thus the possibility of a much more complete explanation of the facts. Most of this kind of information is non-quantitative in nature but often critical to our understanding of the facts and to determining future direction of research.

The participant approach has some inherent problems along with its advantages. The first point is that the

investigator as a partial outsider does in fact affect the data in a way that it would not be affected if he were not to enter the situation. Norberg (1980:5) stresses the

importance of the fieldworker being able to fit into some acceptable category in the community: 104

It is important to bear in mind that he/she has a defined social position in the local community, which in turn gives rise to a varying social role in relation to the interviewees. And I think this social position is more apparent to the informant and the variation in role relations more marked than if the interviewer is an outsider and not in a particular relation to the speakers from the outset. The same thing probably applies to smaller communities compared with larger ones.

Another potential problem of the participant method arises when the investigator attempts to fill all the categories of a judgmental sample. This was not a serious problem in my experience since the groups in which I was a participant were fairly large. In addition, I had large number of friends and associations from previous stays in the area.

It is also often noted that this method is extremely time-consuming, demanding, and even wasteful in terms of resources. It is true that a researcher using this method can make use of only a small portion of the speech data observed or even of that which is recorded. Some of the data is technically unusable; other data is confidential or personal and therefore inappropriate.

Though much of what can be learned using the participant-observer method is not quantifiable, the information is nonetheless valuable. The amount of time needed for this kind of study also depends greatly on prior associations with the group to be studied. In my case I was able to record effectively almost immediately. 105

In addition to the points noted above, there are potential technical problems with the participant-observer method. One is that the method does not produce high quality recordings because the equipment cannot always be positioned well i£ the investigator is to maintain participant status and transcend outsider position. Recent advance in the quality of recording equipment along with a concurrent decrease in size has rendered this objection much less crucial.

In spite of certain weaknesses of the participant- observer approach, it is. clear that it gives us the widest access to natural speech situations. Although the method is often time-consuming, it offers great rewards as well when it is feasible to carry out.

4.3.2 Investigators' Role

In my study of variation in Hong Kong study there were two interviewers: a local Cantonese-speaking woman and myself. The division of the interview work was roughly even. In the final analysis, though, I used more of my own data , primarily because the technical quality was often better. Both of us were clearly in the role of participant- observer since virtually all of our informants were drawn

from our normal social circles (church, work, family, etc.) In my case I told potential informants that I was doing a project having to do with Hong Kong language and culture and asked if they would be willing to talk to me. All were aware of my position as a visiting researcher in the 106

Anthropology Department at the Chinese University of Hong Kong.

My assistant told people that she was an assistant to a researcher at the Chinese University of Hong Kong studying the same subject. After the recording session we offered further explanation. In practice, few people were interested beyond the brief description that we provided them since we were generally well acquainted and needed no particular justification. The explanations were given more to ensure that ethical standards were met than to respond to any misgivings on the part of our informants.

As discussed in the previous section, inherent in any interview situation are a number of biases and influences. As participants in the interview portion of our data and to some extent in the impromptu portion, we recognize those biases. It is not possible to control for every important influence. Therefore, it is important to identify potential biases and to consider their impact. It is reasonable to assume that I, as a foreigner, might have affected the data in some special way as I interviewed informants. There are however, some special circumstances concerning my role among those I interviewed.

In spite of my foreignness, I was and am in many ways an insider among the groups with whom I worked— especially the church group. As a former church volunteer and friend of many members of the congregation, I was not an outsider in spite of cultural differences. As an active member of the church I attended, my involvement did not depend on my 107 research project. My wife and two daughters were also members of the same congregation and were friends with many of the other members. Both my wife and I served in volunteer positions in the church, as is the normal practice in the denomination. In addition, I worked with youth groups affiliated with the church and was involved in the planning and carrying out of the group’s activities.

The practice of this particular denomination is that members attend the congregation in the area where they live. Consequently, we were also neighbors to fellow members of our congregation and social contacts were not limited to

Sunday activities.

As a visiting researcher, my role was also determined within the university. Thus within the groups whose language I studied, my role was clearly defined.

Another point needs to be raised concerning potential accommodation to my foreignness. The church, and to a lesser extent the university, is accustomed to Cantonese­ speaking foreigners. This is in marked contrast to the situation in Hong Kong as a whole where a Cantonese-speaking foreigner often finds it extremely difficult to get a native Cantonese-speaker to use the language with him, much less use it naturally. Because of past experience with Cantonese-speaking foreign missionaries, it is quite normal

to most of the congregation that a foreigner speak

Cantonese. Most of the people with whom I interacted did not appear to alter their language in any obvious way.

However, to ensure that there are no gross differences in 108 respect to the variables studied here, I compare my data with the data collected from my local assistant in chapter Six.

Since my assistant is native to and a resident of Hong

Kong there is no issue with her data in so far as

foreignness is concerned. Moreover, her place among the church informants she interviewed is similar to mine. Her non-church informants were mostly friends and family members. There are, however, other potentially important influences and effects that could result from her

involvement in the interview situation.

As a young (27 year-old) female, she sometimes encountered difficulty in establishing herself as a

legitimate interviewer, especially when interviewing older males. In addition, her status as a research assistant was

not as well established as mine. This difficulty usually

took the form of the interviewee questioning the purpose of

the interview. Once the interview was underway there were seldom any overt resistance or attempt to change roles.

This problem has been noted by Wolfson (1980:68) and is discussed by Milroy (1987:47-48). Where interview roles are

not clear, confusion often results. Wolfson gives this potential confusion as argument in support of clear and

unambiguous interview structure, which both of us attempted

to follow in our interviews. It is important to note that the participant effect is only crucial with the interview data. The role of the 109 interviewer in the impromptu section is minimal and in the public speech is effectively nil. Though no apparent differences were noted in the data with respect to the variables under consideration, checks are carried out in the following chapters to explore the effects of interview background characteristics on the three variables, ng-/0~, n-/l- and k-/h~.

4.4 Data Collection

4.4.1 Methods Many ingenious and successful methods have been used to obtain the necessary linguistic data. These range from Labov's rapid collection methods in New York City department stores to fairly long-term stays in the community as a participant-observer (cf Milroy 1980). In between the two is the sociolinguistic interview which has become a standard tool since Labov's early studies. Before carrying out interviews however, one must find a willing pool of people which is as representative of the population as possible.

As already mentioned, Labov's New York City study used a subset of a previously complied sample frame from a sociological study. Though it began as a random sample, because of a low success rate in carrying out interviews, the sample in the end was closer to a judgmental sample in fact.

Others, such as Shuy's Detroit study (1968) have used go-betweens to introduce them to potential informants. Shuy's method was to use the schools to gain access to 110 student informants and their families. Bauer (1982) followed this method for part of his data collection in his

Hong Kong study. He reported a very high rate of failure in carrying out the actual interviews. Questionnaires explaining the project and requesting an interview were distributed to students of three schools representing a wide socio-economic range. Out of 381 questionnaires, 102 were returned positively and just 23 people were successfully interviewed. In practice many of the potential informants were reluctant to be interviewed, being suspicious of the researcher's motives and purposes. Bauer's other 52 informants were obtained by interviewing friends and contacts in Hong Kong.

In my study 120 cases were successfully recorded, of which 82 were eventually transcribed for use. Time constraints did not allow me to transcribe the other cases for the present study. I was refused just twice and my assistant six times. Thus, our inside role allowed us to virtually eliminate the problem of refusal. Milroy's Belfast (1985) study used the participant observation method described above to collect data. The aim of this method is to study intensively a relatively small number of people by becoming involved with the group to be studied. The method has the advantage of avoiding the kind of refusal problems that Bauer encountered in his study.

In this study we adopt a participant-observer role similar to that in Milroy (1985). However, unlike Milroy I was a participant before I began my study and therefore did Ill not have to establish a role within the group I studied. I made use of my participant role to collect speech in three distinct registers; impromptu, interview and public speech- making. Below I outline the process by which the data for

this study were collected.

4.4.2 Steps in Data Collection

The first step in my data collection process was to

record a public speech. The largest portion of these public speeches were from the church groups. The church's regular

Sunday meetings typically include three speakers drawn from the congregation at large; the speakers are different every

week. They can be from any age group (though usually

teenage or older) and of either sex. The talks are usually

about fifteen minutes, and prepared in advance. Typically

notes are used, but the talks are rarely read.

Occasionally, talks are given by non-literate or marginally

literate individuals. In those cases, the main points are memorized by the speaker.

After the formal data from a given speech was

collected, my assistant or I approached the individual for a recorded interview, which provided me with the second level

.of formality.

The interview consisted of two parts. The first part

contained structured background questions. These questions

were used both for speech data and to obtain personal background information used to construct speaker variables.

The second part of the interview consisted of a wide range 112 o£ questions concerning the informants' attitudes about Hong Kong society and culture as well as language (see Appendix

2). Younger informants were asked about schools and friends as well. My association with the Anthropology Department at the Chinese University of Hong Kong made questions about

Hong Kong society very natural.

The period of my fieldwork in Hong Kong happened to coincide with a period of extreme political importance to the people of Hong Kong, namely the impending return of Hong Kong to the control of the People's Republic of China in

1997. This is a topic that was clearly on just about everyone's mind at the time of my fieldwork, less than nine years from the event. Therefore I included a series of questions concerning people's feelings on the issue.

Virtually everyone was willing to talk about the topic and it elicited often passionate and unguarded speech. It is also a topic about which it might be expected that a foreigner would have interest.

For those people I knew well the second part of the interview, and especially the 1997 question, elicited speech that went far out of the structure of the interview. In many of those instances the speech was much like impromptu speech. Because of this tendency, I made a point of keeping the background portion structured and unambiguously an interview. I also made it a little longer than it otherwise needed be. After concluding the formal interview questions, I would make a clear signal that the interview was completed 113 by thanking the participants for their participation and by putting away my interview sheet. I then would ask them if they had any questions about what I was doing. Usually there were not too many questions and we would begin talking about other subjects of mutual interest. Since I often was acquainted with other family members, I would try to withdraw from the situation and move around the home. Most

Hong Kong flats are very small and so the activity generally is centered around one or two small rooms. As such there is not too much trouble getting fairly clear recordings of speech anywhere in the room.

This approach allowed me to record the impromptu speech of many of my informants. Recording in homes was fairly efficient because I was able to interview various members of the same family and then record their casual, impromptu speech as well under normal circumstances.

With some friends I asked general permission to record their speech when they visited my home socially. Some departmental colleagues allowed me the same privilege. Lastly, there were some circumstances under which I could do limited on-the-spot transcription without a recorder. This was possible for a couple of reasons. All three of my variables are essentially dichotomous (i.e., they have only two possible realizations). Therefore the transcription was usually straightforward. Secondly, I was often in situations where I was in the presence of those who had agreed to act as informants but where a recorder was not practical (e.g., camping trips, light supervision of youth 114 activities). In these situations I was present but not always actively involved.

The process outlined above provided me with three distinct genres of spoken language for each informant thus allowing me to focus on register as a variable as well as speaker background variables. All three registers were obtained for 14 speakers which form the data base for analysis of register. The larger data base (82) focuses on other background variables and makes use of all types of speech that I was able to record.

4.5 Method of Analysis

Because of the time-consuming nature of the type of data collection described above, this study is largely a case study. The numbers are large enough, however, that a statistical test becomes useful to confirm patterns suggested by the descriptive data. Some of the formulas

involved in statistical significance test are fairly • complicated and cannot be addressed here. However, I outline below the basic principles surrounding the test that

I use in this study. For those interested in the details of

statistical practice, there are many excellent textbooks

available for reference. In addition, there are several recent books dedicated to statistics in language study (see

Woods et al 1986; Butler 1985).

As mentioned in section 3, it is important to go beyond

descriptive data, tables and other raw statistics to be

certain that evident difference are indeed significant. 115

Sometimes apparently large differences turn out not to be significant because they are drawn on small samples (see section 5.1.2). Much of the early work on sociolinguistic variation paid little attention to statistical significance. Recently, however, sociolinguists have employed more sophisticated methods of statistical analysis to sociolinguistic variation data.

Milroy's study of social networks made use of the

Analysis of Variance (ANOVA) technique to handle her data. ANOVA is a statistical technique commonly used to compare means between sub-groups of a population. For example, if we find that males and females as a group have different means scores with respect to a given variable of interest, say use of 1- initial, we must then determine if the difference is the result of systematically different linguistic behavior between the sexes or a result of chance.

The ANOVA technique allows us to address this question.

ANOVA produces a statistic known as the F-ratio. The F-ratio is affected by several factors including the raw scores and size of group (i.e., how many informants). The

F-ratio itself is then tested for significance by using a technique called a T-test. The T-test gives us a statistic that indicates the probability (p) of our F-ratio occurring as a result of chance. A conventionally accepted p level in statistical literature is one in twenty (or p < 0.05). If our F-ratio exceeds the value at the acceptable p level in the T distribution chart, then we can be fairly certain that there is indeed a relationship between sex and 1- frequency. 116 continuous dependent variables where fairly small number of cases are concerned and where all the independent variables can be expressed as discrete (i.e.,categorical). One limitation of the standard ANOVA test is it requires a balanced design with testing more than one factor simultaneously. For example, if we want to at once look at the effects of sex, age, and education level on a 1- frequency, there must be equal numbers of males and females, equal numbers of each age group, and equal numbers of each education group.

Sometimes, however, it is desirable to have the

flexibility of an unbalanced design. One might not want to exclude available data for a group of females simply because

there are not an equal number of males. In practice it is

often hard to get large samples of certain subgroups (for

example, persons over 80 years of age).

GLM is special type of ANOVA that allows unbalanced design. The GLM design allows us to utilize data in hard-to-

fill cells without sacrificing useful data in the easier-to-

fill cells. To allow for more flexibility for my data, I

employ the GLM technique in this study for the main data

base.

For the third data base where the effects of word class

are analyzed, I use another technique called chi square. Chisquare compares the counts of events (e.g., tokens of n-

/1~) with expected counts across subgroups (e.g., word class noun, adverbs, verbs, etc.). The chisquare test gives us a

way of checking whether the difference between observed and 117 way of checking whether the difference between observed and expected count are greater than might be expected as a result of chance. By using a standard formula we obtain a chisquare statistic. As with ANOVA and GLM, we must test that statistic for significance. If that statistic exceeds the critical value at the p < 0.05 level we conclude that the result is statistically significant. 118

NOTES TO CHAPTER FOUR

1. The data is actually derived from 49 informants, since some of the informants were recorded separately in more than one context.

Of the forty-six recordings where an interviewer was involved in some way, I did twenty-seven and my assistant did nineteen. Thirty-four of the recordings were made with no participation of an interviewer. CHAPTER V

Analysis of Results

5.0 Introduction

In this chapter I discuss the results of my own Hong

Kong study. Although the focus is on the quantitative analysis of group data, I also make note of the more striking non-quantitative results. Also, I compare the data from some typical and non-typical individual cases to the group results. In doing so I draw on the three data bases described in the previous chapter. In addition, comparisons are made to the results of previous work on the two variables ng-/0- and n-/l-.

5.1 Analysis of Statistical Results

This section discusses the main effects of social background variables on the three sociolinguistic variables: n-/l-, ng-/0- and k-/h-. Drawing upon the first data base, our main statistical model includes four background variables: sex age, educational level and ethnicity (home district). The possible influences of other languages spoken, home language, language of education, birth place and parents' educational level, social group and interviewer are also considered. The latter group of background variables are only noted here when the result is

119 120 particularly prominent or theoretically interesting. Many of the background variables are inadequately represented

(i.e., the N is too small) to make any strong claim concerning their effect on the linguistic variables.

Nevertheless, a rough check was made for many of them. Full results of statistical tests and complete individual data are provided in the appendices. The second data base has fewer cases but focuses exclusively on register as an independent variable. Lastly, the third data base is composed of combined tokens from all informants without regard to background variables and focuses on lexical category. A complete list can be found in the appendix G.

5.1.1 The Variable ng-/0-

As discussed in chapter 3, determining the base form for the ng-/0- variable is sometimes difficult. The most straightforward way to deal with the problem is to assign a base form according to historical category. For the most part, Cantonese references accept the historical forms as the prescriptive forms (see section 3.1.2). For the most part, the historical forms are divided according to tonal register, upper register tone syllables are realized with zero initial and low register tone syllables with velar nasal. (Hashimoto 1972: 141-45) However, there are some obvious exceptions to the pattern with certain colloquial Cantonese words which have no established historical etonym.

For example, it is widely acknowledged that Cantonese ngaam 121

•correct' and ngaak 'to cheat' are prescriptively a velar nasal in spite of its upper register tone. With those two exceptions, I classify my ng-/0- data by register for the purposes of this study. Thus, there are two related phenomenon that concern us here: the tendency of speakers to use the [0-] form where there is an historical velar nasal and secondly the tendency to preserve the zero initial where one existed historically, or alternatively the tendency of speakers to affix a non- etymological velar nasal where there was an historical zero initial. For the presentation of the data in this study, I basically follow historical categories to classify words as prescriptively velar nasal initial or zero initial. First the data for historical velar nasal are presented, followed by the data for historical zero initial. When there is a clear consensus by reference works concerning colloquial words such as ngaam and that reference goes against historical classification, I have accepted the references' classification. Fortunately, these problematic cases do not represent a large part of the total data and are not likely to distort the data in any case.^

Variable ng-/0- : Historical »ng- Table 19 below provides the mean [0-] frequencies for words with historical *ng- initial by each main social factor. Below each factor are the F statistic and P level,

(see section 4.5). 122

Table 19 Mean [0-J Frequencies by Sex, Age, Education and Place of Origin (*ng-).

Overall N Mean 80 0.252 sex N Mean

male 48 0.166 females 32 0.381

P=6.00 *P=0.020

age N Mean 1-18 12 0.662 19-30 21 0.367 31-45 36 0.095 46+ 11 0.099 F=4.68 *P=0.008

education N Mean

0-6 16 0.146 7-12 41 0.290 13 + 15 0.233

F=1.96 P=0.158

Place of origin^ N Mean Guangzhou & area 42 0.285 Seiyap Area 11 0.125 N.E. Delta 6 0.263 Zhongshan & area 10 0.367 Other 5 0.008

F=0.21 P=0.931

♦Statistically significant at p < .05 level 123

The means in table 19 represent the average proportion (or percentage) of the [0-] over total number of *ng- by sub-groups. For example, 16.6% of all tokens of *ng- uttered by male speakers were realized as [0-] while females speakers produced 38.1% [0-1. The frequency of [0-] decreases steadily (from .662 to .009) as age increases.

Note though that there is virtually no difference between the two older age groups with respect to their production of the *ng-/0- variable. Education level does not correlate in our sample with production of [0-] in spite of an apparently large disparity between the first group on one hand and the second and third group on the other hand. Likewise, there seem to be fairly large differences in the average frequencies among the places of origin (from Other at .008 to Zhongshan at .367).

The differences are not statistically significant because of the small number of speakers who have origins outside the

Guangzhou-Hong Kong area. Thus, of the four background variables, only sex and age are statistically significant for *ng-/0-.

Though the above summary of the data provides us with a general picture of the variable n-/l- with respect to our four main social factors, the situation is not so straightforward in reality. Since few factors are totally independent of one another, there are likely to be interactions between background variables. For example, up to a certain point, the older a person the more likely the person is to have received more education. As a result, age 124 and education usually interact. Accordingly, it might be useful to look not only at frequency of use across age but within sub-categories of age and education. Note table 20 below.

Table 20 Mean [0-J Frequencies by Sub-groups for Sex, Age and Educational Level (*ng-)

Male Female ALL Age G1 (1-18) Ed. G1 (1-6) 0.333 0.690 0.511 1 1 2 Ed. G2 (7-12) 0.726 0.677 0.696 3 5 8 Ed. G3 (13+) 0 0 0 Age G2 (19-30) Ed G1 (1-6) 0 0 0 Ed.G2 (7-12) 0.243 0.420 0.361 5 10 15 Ed.G3 (13+) 0.220 0.546 0.383 4 3 7 Age G3 (31-45) Ed.Gl (1-6) 0.064 0.300 0.143 6 3 9 Ed.G2 (7-12) 0.046 0.073 0.052 14 5 19 Ed.G3 (13+) 0.115 0.277 0.133 8 1 9 Age G4 (46+) Ed.l (1-6) 0.007 0.000 0.005 4 1 5 Ed.2 (7-12) 0 0 0 Ed.3 (13+) 0 0 0

ALL 0.145 0.407 0.246 45 29 74 125

With one exception, all sub-groups where there are data conform to the general pattern in table 19. In all sub­ groups the frequencies for females are greater than for males. Moreover, within age groups there are no notable differences across education groups. Note though that in the second education group of Age group 1, males have a slightly higher [0-] frequency than females. Education group 1 of that same age group has the opposite pattern, but is only represented by one male and one female speaker. One might suggest that there is a gender difference with respect to [0-] frequency, but it does not emerge until later in life.

Another major factor here is speech register. The basic data as taken from second data base are presented below in table 21.

Table 21 Mean 10-] Frequency by Speech Register (*ng-)

Speech Type N Mean Impromptu 14 .356 Interview 14 .218 Public Speech 14 .087 Combined 14 .220

P=3.75 *P=0.032

♦Statistically significant at p < .05 level 126

Table 19 shows a pattern o£ decreased use o£ the (0-] form as the level o£ £ormality increases. Moreover, the

di££erence is statistically signi£icant (p <.05).

Figure 7 below illustrates the breakdown according to word class with respect to the variable *ng-. Token numbers

are above each bar £or their respective word classes. For the individual words please see appendix G.

O.B

0,7 -

0.6 -

30 0.5 - tt3 KT I 0.4 -

I 0.3 -

23SG 2711 0.2

edv verb noun pro sv Combined Word Cloao

Figure 7 [0-] Frequencies by Worà Class (*ng~) 127

Though it was not my original intention to consider non-social variables in this study, casual observation during the early stages of the study suggested that word

class might be an important factor, particularly with respect to the first person pronouns ngoh 'I,me' and

ngohdeih 'we, us' and the demonstrative affix ni- 'this' (e.g., nidouh 'this place or here') In fact, the data confirm my casual observation.

The combined [0-] frequency of .22 (Table 23) tends to hide the wide variation between word classes as illustrated in figure 7. The classes of adjective and noun on the high

end of the scale and the classes of adjective verb on the low end seem to vary markedly. Applying a chisquare test to the total token count statistically confirms our observation. Note table 24 below.

Table 24 Chisquare Analysis for ng-/0- by Word Class (*ng)

Expected counts are printed below observed counts adv noun pro sv verb Total [ng-1 14 154 1877 71 15 2131 33.80 160.36 1851.95 61.31 23.58 [0-] 29 50 479 7 15 580 9.20 43.64 504.05 16.69 6.42 Total 43 204 2356 78 30 2711

ChiSq = 11.599 + 0.252 + 0.339 + 1.531 + 3.123 + 42.617 + 0.926 + 1.245+ 5.624 + 11.474 = 78.729 df = 4

The chisquare value of 78.729 is over the accepted critical values (9.4877 at the .05 level and 13.2767 at the 128

.01 level). Any value over the critical value confirms the non-independence of word class and the variable *ng-/0-.

Once we have established non-independence, we need to examine the individual cells to see which ones contribute most to the chisquare value. We can determine that contribution by noting the individual chisquare values following the table above. By doing so, we can see that adverbs and verbs have a notably greater tendency to be realized with the innovative [0-1 initial, whereas stative verbs (sv) have a lesser tendency toward [0- 1.

Variable ng-/0- : Historical *0- In addition to ng-/0- variation involving words with historical velar nasal initial, there are also a smaller number of tokens (380 total tokens) with historical zero *0- exhibiting variation between ng-/0-. Because of the small token numbers it is difficult to establish any solid correlations from the data. In spite of some apparently large differences between sub-group means, none of the differences are significant. Nevertheless, the basic descriptive statistics are presented below for reference. In spite of some suggestive patterns, the figures for *0-

should be taken tentatively. Table 23 below provides descriptive data for 0- frequencies for words with historical zero initial by each

main social factor. Below each factor are the F statistic and P level from an analysis of variance (ANOVA) test. 129

Table 23 Mean 0- Frequencies by Sex Age, Education and Place of Origin (*0~).

Overall N Mean 30 0.573

sex N Mean male 23 0.477 females 7 0.890

F=1.90 P=0.192

age N Mean 1-18 1 1.000 19-30 10 0.746 31-45 17 0.514 46+ 2 0.000

P=0.19 P=0.900

education N Mean 0-6 6 0.299 7-12 17 0.703 13+ 7 0.493

F=2.94 P=0.088

Place of origin N Mean

Guangzhou & area 13 0.686 Seiyap Area 5 0.380 N.E. Delta 4 0.619 Zhongshan & area 2 0.875 Other 2 0.130

F=1.46 P=0.269

♦Statistically significant at p < .05 level 130

Table 24 Mean fO-J Frequencies by Sub-groups for Sex Age and Educational Level for f*0-J

Male Female ALL G1 (1-18) Ed. G1 (1-6) ------1 1 2 Ed. G2 (7-12) — ---- 1.000 1.000 3 5 8 Ed. G3 (13+) — — — — — ------0 0 0 G2 (19-30) Ed G1 (1-6) ---- — ------0 0 0 Ed.G2 (7-12) 0.623 0.808 0.696 5 10 15 Ed.G3 (13+) — — — 0.994 0.944 4 3 7 G3 (31-45) Ed.Gl (1-6) 0.293 0.917 0.449 6 3 9 Ed.G2 (7-12) 0.673 ------0.673 14 5 19 Ed.G3 (13+) 0.313 — — — 0.313 • 8 1 9 G4 (46+) Ed.l (1-6) 0.000 — — — 0.000 4 1 5 Ed.2 (7-12) ------0 0 0 Ed.3 (13+) — — ------— — — 0 0 0 ALL 0.477 0.890 0.573 45 29 74

Examining the means for the subgroups it seems that many of the similar patterns noted above for *ng- hold true for *0- as well. Moreover, the differences in means for all sub­ groups seem very large. However, none of the differences is 131 statistically significant because of the small number of informants with adequate tokens of the *0-, Because there were often inadequate token numbers for the *0- initials, I was not able to use the data for many of my informants. In spite of the lack of statistical significance, the patterns are strikingly similar. Though the mean [0-] frequencies for historical *0- are higher than for historical velar nasal *ng- across the all sub-groups of the four main social variables illustrated above, the patterns are similar. Males are less likely to use the zero form even when it is the historically and arguably the prescribed form. The older the speaker, the less likely the speaker is to use the [0-] form. As with the velar nasal, the situation is mixed regarding educational groups. The group with the highest frequency of [0-] is the middle group, followed by the younger group and lastly the older group. The margin of difference is much larger with *0- tokens than with *ng- however.

Turning to place of origin, we see that, as with the *ng- tokens, the mean [0-] of all sub-groups are quite close to each other. Only the Seiyap (.380) and Other (.130) category have [0-] means that seem to vary substantially from the rest. The Siyi and Other figures are based on small N and consequently the differences are not statistically significant.

Unlike words with historical *ng- , *0- initial words do not show the same pattern of decreasing [0-] frequency with increasing formality. Though the means are equally 132 disparate, they are based on far fewer cases and the difference is not statistically significant. See table 27 below.

Table 25 Mean [0~J Frequency by Speech Register (*0~)

Speech Type N Mean

Impromptu 14 .542 Interview 14 .827 Public Speech 14 .369 Combined 14 .579

F=2.52 P=0.109

5.1.2 The Variable n-/l-

The variable n-/l- is the most straight-forward of the three under discussion. Although 1- to n- hyper-correction has been reported under limited situations (see Pan's (1981) word list data), the variation essentially goes one way,*n- >1-. In informal pilot data, I found virtually no cases of *1 > n - . Consequently, I did not transcribe *1- tokens when gathering may data. The n-/l- variable correlates with many of the same social and situational factors as the ng-/0- variable, though in a somewhat different way. Table 28 below

summarizes the mean [1-] frequencies for the sub-groups of our four main factors. 133

Table 26 Mean [l-J Frequencies by Sex, Age, Education and Place of Origin (*n~)

N Mean Overall 81 .653

sex F Mean

male 49 0.612 females 32 0.715

F=6.11 P=0.807

age N Mean

1-18 12 0.854 19-30 22 0.700 31-45 36 0.669 46 + 11 0.282

P=3.45 *P=0.028

education Mean 1-6 16 0.600 7-12 41 0.699 13 + 16 0.732

F=0.30 P=0.743

Place of origin N Mean Guangzhou & area . 42 0.720 Seiyap Area 12 0.550 N.B. Delta 6 0.412 Zhongshan & area 10 0.756 Other 5 0.452

F=2.98 *P=0.043

«Statistically significant at p < .05 level 134

As with the ng-/0- variable, sex and age prove to be significant factors with the variable n-/l-. Females again are more likely to use the innovative form [1-] than are males. Likewise the younger speakers are more apt to use the [1-] form than are the older speakers, but the stratification pattern is different. The three younger age groups are very close with respect to their frequency of use of [1-], whereas the oldest group (45 and over) uses it substantially less. With ng-/0- (*ng-), the crucial cut-off is between the second group (19-30 years) and the third group (31-45), the third and forth groups hardly differing at all. The use of [1-] seems to actually increase with education, from .600 to .699 to .732. The difference, however, is inconsequential and not statistically significant. Place of origin is a significant factor with [1-]

frequency. Both the Guangzhou area and the Zhongshan area exhibit a substantially higher [1-] frequency. Northeastern Pearl Delta and the Other group show a

substantially lower [1-] frequency. The latter two groups include informants whose home areas are under the potential

influence of other dialect groups such as Min and Kejia: Qingyuan, Huiyang and Huidong. Table 27 below charts three of the four main social factors with respect to n-/l- and as they interact with each

other. 135

Table 27 Mean [l-J Frequencies by Sub-groups for Sex, Age and Educational Level (*n-J

ROWS : age-category / education-category COLUMNS: sex

Male Female ALL Age G1 (1-18) Ed. G1 (1-6) 0.883 1.000 0.942 1 1 2 Ed. G2 (7-12) 0.873 0.826 0.843 3 5 8 Ed. G3 (13+) 0 0 0 Age G2 (19-30) Ed G1 (1-6) 0 0 0 Ed.G2 (7-12) 0.502 0.730 0.654 5 10 15 Ed.G3 (13+) 0.814 0.786 0.802 4 3 7 Age G3 (31-45) Ed.Gl (1-6) 0.668 0.626 0.654 6 3 9 Ed.G2 (7-12) 0.635 0.802 0.672 14 5 19 Ed.G3 (13+) 0.638 1.000 0.679 8 1 9 Age G4 (46+) Ed.l (1-6) 0.352 0.421 0.366 4 1 5 Ed.2 (7-12) 0 0 Ed.3 (13+) 0 0 0 ALL 0.637 0.760 0.685 45 29 74

CELL CONTENTS — [1-] frequency: MEAN COUNT 136

The patterns tor age and sex in table 28 hold within sub-groups as charted in table 29. However, the pattern for education is slightly more complicated. At first glance it

appears that [1-] frequency decreases with education within

the youngest age group and increases with education within

the second group. The other two age groups do not vary

greatly with education. For several reasons however, the means are not a reliable reflection of the situation. First

of all, the cells in the first age group are all small if not empty. The second age group has no speakers within the lowest education category (0-6). Because cells with few or no speakers are to be expected when age and education interact, it is hard to draw explanations on the basis of

data of the kind in table 28.

Turning again to speech register, we can see similar

patterns as those shown with the *ng- tokens of variable ng- /0-. The basic data are taken from second data base are presented below in table 28.

Table 28 Mean [l-J Frequency by Speech Register (*n~)

Speech Type N Mean Impromptu 14 .805 Interview 14 .582 Public Speech 14 .386 Combined 42 .591

F=4.64 *P=0.016 ♦statistically significant at the p < .05 level 137 As with ng-/0- (*ng-), the differences in the means are

statistically significant and remarkably regular. Again tl~l frequency falls with increased formality.

As with ng-/0-, word class is again an important factor with the n-/l- variable. However, the contrast is between different word classes. Observe figure 8 below. Again note the token numbers above each bar.

0.9 - 306 132 103

0.8 -

0.7 - 654 2361

S* 0.6 - 5 O*3 £ 0.5

585

0.3

0.2 -

verb «V odv noun pro dem port Combined Word Clen

Figure 8 [1~] Frequencies by Word Class (*n-)

On the high end more than 80% of the tokens within the classes of verb, stative verb, adverb are realized as Ï1-J 138

initial. On the low end, the classes of demonstratives and particles have frequencies less than .40.

The small token numbers within the adjective and particle class make judgments concerning them difficult, but

there are more than enough tokens to establish a pattern with the other word classes. On the basis of token counts, we can construct a chisquare contingency table as follows.

Table 29 Chisquare Analysis for n-/l- by Word Class (*n-)^

Expected coiits are piiited below observed coiots

adv del ■001 part pro sv V Total (d -I H 3(0 121 5 210 22 30 707 35.(2 212.21 1(1.41 3.11 22(.14 45.(4 105.01 (1-1 14 225 3S( 0 43( 110 2(0 140) (7 31 312.72 310.(1 5 . » 427.0( 0( 3( 200.1) Total 103 515 417 ) (54 132 30( 227( CbiSg < 7.752 +122.571 + 13.341 + 11.140 + 0.2)3 + 12.247 if 43.457 4 4.1)7 + (4.))5 t 7.051 + 5.001 t 0.155 t (.473 1» 22.)() = 322.021 df = (

1 cells with expected counts less than 5.0

The chisquare value of 322.826 is well over accepted critical values (12.59 at the .05 level and 16.81 at .01 level). The demonstrative class clearly is the most outstanding in its contribution to the chisquare value. Words in the demonstrative class are much less prone than words in other classes to be realized with [1-1. This 139 class includes those words prefixed by ni- 'this' (e.g., nidouh 'here', nigo 'this*, nidi 'these', etc.).

5.1.3 The Variable k-/h-

The variable k-/h- is distinct from the preceding two

in that it only occurs with the second person pronouns keuih 'he/she' and keuihdeih 'they or them.' Thus this discussion includes no analysis of word class as a variable, since it is restricted to the class of pronoun only. There are no reports suggesting k-/h- as a variable occurring with non-pronouns nor did I observe it myself anywhere else. Possible explanations for this restriction are taken up in the following chapter. Here I present the data for social factors along with the contextual factor of register.

Table 30 and table 31 summarize the descriptive data

for k-/h-. 140

Table 30 Mean [h-J Frequencies by Sex, Age, Education and Place of Origin (*k-)

N Mean Overall 82 .309

sex N Mean male 45 0.252 females 31 0.390

P=1.87 P=0.181

age N Mean 1-18 11 0.648 19-30 22 0.414 31-45 34 0.212 46+ 9 0.000 P=3.47 *P=0.027

education N Mean

0-6 16 0.196 7-12 39 0.302 13+ 15 0.484

P=1.97 P=0.156

Place of origin M Mean Guangzhou & area 39 0.319 Seiyap Area 12 0.184 N.E. Delta 6 0.306 Zhongshan & area 10 0.365 Other 4 0.318

P=0.31 P=0.866

*Statistically significant at the P < .05 141

Table 31 Mean [h-J Frequencies by Sub-groups for Sex, Age and Educational Level (*k'~)

ROWS : age-cat / ed-cat COLUMNS : sex

Male Female ALL Age G1 (1-18) Ed. G1 (1-6) 0.429 0.819 0.624 1 1 2 Ed. G2 (7-12) 0.867 0.580 0.662 3 5 8 Ed. G3 (13+) - — — — — — — 0 0 0 Age G2 (19-30) Ed G1 (1-6) — — — — — — — — — 0 0 0 Ed.G2 (7-12) 0.181 0.355 0.297 5 10 15 Ed.G3 (13+) 0.702 0.614 0.665 4 3 7 Age G3 (31-45) Ed.Gl (1-6) 0.156 0.317 0.210 6 3 9 Ed.G2 (7-12) 0.155 0.173 0.159 14 5 19 Ed.G3 (13+) 0.291 0.581 0.327 8 1 9 Age G4 (46+) Ed.l (1-6) 0.000 0.000 0.000 4 1 5 Ed.2 (7-12) --- — — — — — — 0 0 0 Ed.3 (13+) — — — — — — — — 0 0 0 ALL 0.259 0.405 0.317 45 29 74

CELL CONTENTS - [h-] frequency: MEAN COUNT 142

As with the preceding two variables, sex and age seem to pattern similarly with the variable k-/h-. Again, the mean frequency for the innovative form, [h-] is higher for females (.390) than for males (.252). However, the difference between mean [h-1 frequencies is not significant.

Moreover, the female means are not consistently higher across sub-groups. In Ed.G2 of the youngest age group the males have a higher mean Ih-1 frequency than females (male mean = .876, female mean =.580). Likewise, males in Ed.G3 of the 19-30 year age group have a higher means (h-1 (male mean = .702, female mean = .614).

Although (h-1 frequency generally decreases with age, there is substantial variation within age groups by education group. For example, combining both males and female data, we see an increase of [h-1 frequency from Ed. G2 to Ed.G3 within Age G2 (.297 to .665). In Age G3 we see a somewhat different pattern where the means for Ed.Gl and

Ed.G3 are very similar and both higher than Ed.G2. (Ed.Gl=.210, Ed.G2=.159 and Ed.G3=.327). The irregular pattern of the data argues that education is indeed an insignificant factor with regard to the k-/h- variable.

The mean differences for [h-1 by place of origin suggest no particular pattern. The mean for the Seiyap group is somewhat lower than for the others but the difference is not significant. The P-value of .866 is so great that it is virtually certain that the differences are the result of chance. Speech register is a significant factor with the 143 k-/h- variable and exhibits a similar pattern as the other two variables. Like the other two variables, [h-] frequency decreases with increased formality level. There is virtually no difference between the impromptu speech and interview, but a very large difference between the first two and public speech. Note the figures in table 32 below.

Table 32 Mean [h-J Frequency by Speech Register (*k'~)

Speech Type N Mean Impromptu 14 .368 Interview 14 .374 Public Speech 13 .073

F=4.22 P=0.022

5.1.4 Other Correlations: Dependent Variables

The data presented in this section suggest some common patterns for all three variables. One explanation for the common patterns is that the linguistic variables themselves have some measure of correlation. À commonly used measure to explore such correlation is called Pearson's Correlation Coefficient. Below are the coefficients for the four dependent linguistic variables (ng-/0- is represented as two separate variables). 144 Table 33 Correlation between Dependent Variables

*ag-|0-l fteq *O-l0-lfteg 111 freq

*0- lfl-1 freq 0.122 [11 freq 0.431 0.302 [h] freq 0.663 0.199 0.446

The closer the coefficient is to 1.000, the more closely two variables they correlate. If the number is positive, then the two variables tend to increase together. If the number is negative, one variable tends to decrease as the other increases. Either end of the scale, shows an association. A coefficient of 1.000 shows perfect linear correlation between the two variables. It should be noted though that the converse is not necessarily true. That is, a low coefficient does not mean no correlation, but only that there is no linear correlation.

At sample size N=82, at level p < 0.05 our critical value is .217, at p < 0.01 is .283. Only the correlation between [h-J frequency and [0-1 frequency for *0- tokens fails to show significant correlation. Thus we may be able hypothesize that the other three variables may be part of a set of related linguistic markers.

5.2 Notes on Non-Statistical Observations In addition to the quantitative data presented in the previous section, there are a few other observations concerning the three variables worth noting. Besides my interview data, I collected several samples of older movies 145 with the intent of comparing the speech of the roles with my quantitative data. I listened to a short stretch (about 15-

20 minutes) of a Movie called Wanniang [stepmother] (Kowloon Movie Company, ca 1950) and observed the speech of the three

main characters: a widow (mid 30*s), her daughter (about 10 years old) and the widow's fiance (about 40 years old).

All tokens of our three variables produced by the man were the conservatives variants of the three variables ng- /0-, n-/l- and k-/h- (ng-, n- and k-). The woman produced all conservative forms except four 6 instances of I1-], all

in conversation with her daughter. The girl also used mostly conservative forms. The man produced all conservative forms, with several tokens of the [0-] and

[1-1 variants. The breakdown is as follows:

Widow's boyfriend: [1-] frequency 0/20 (0-1 frequency 0/15 [h-1 frequency 0/6 Widow:. [1-1 frequency 4/26 [0-] frequency 1/23 [h-1 frequency 0/5

Innovative tokens: leih 'you* (2), leuih 'girl' (2), (O)oihpoh 'maternal grandmother'

Daughter : [1-1 frequency 5/16 [0-1 frequency 0/12 [h-1 frequency 0/3 Innovative tokens: leih 'you' (5) 146

As a data source, films have limitations. Moreover, this small exercise cannot tell much about the wider usage of our three variables in the 50's. Nevertheless, their occurrence does establish that variation has existed for at least several decades and probably much longer (cf. Chao

1947). It is also interesting in the context of this study to note that occurrences of [1-] occurred in the daughter's speech and the mother's speech while talking to her daughter. This pattern would tend to reinforce the role of situational context or speech register as a critical factor in the variation of the three variables under consideration here.

5.3 Anomalous Tokens

In addition to the two forms of the variable n-/l- [n-] and [1-], I noted a limited number of a third form [j-] occurring before the high vowel [il in the words: nidi /jidi/ 'these' (15) and nifin /jihn/ 'year' (3). Since I was not attuned to the possibility of this variant earlier in the study I may have missed some tokens in my early transcripts. Still, they are a small percentage of the total and only appear to occur in the limited phonological context noted above.

5.4 Comparison to Previous Studies The most striking difference between my results and those of previous studies concerns my results on sex difference. Both Yeung (1980) and Pan (1981) reported that 147 difference. Both Yeung (1980) and Pan (1981) reported that women tend toward the conservative in both the ng-/0- and n-/l- variables. Yeung (p.30) (and Bauer 1982:212) on the other hand reports males to be innovative with respect to the variables gw-/g- and kw-/k-. My results show that males use the conservative form with the ng-/0- variable and that there is no significant sex difference with the n-/l- variable. Although my data shows decreased use of all three innovative forms with increase of age, I did not find evidence for Yeung's generalization that all young people have switched to 1- (pp.8-9). Consider some of Yeung's results once again.

Table 34 Percent fl-J Scores from Yeung’s Study by Age and Sex (Adapted from Yeung 1980:25)

Age Males Females

12-16 100 100 20-27 100 95 30-36 100 88 40-46 96 73 50-57 94 65

Table 35 Percent [0-J Scores from Yeung’s Study by Age and Sex (*ng~) (Adapted from Yeung 1980:34)

Age Males Fe: 12-16 67 48 20-27 21 4 30-36 17 4 40-46 18 0 50-57 12 3 148

Table 36 Percent lO-J Scores from Yeung's Study by Age and Sex (*0-) (From Yeung 1980:34) Age Males Females 12-16 91 86 20-27 73 96 30-36 72 96 40-46 74 100 50-57 67 97

My data agrees with both Pan and Yeung with respect to

the direction of age related variation. Younger speakers were shown to innovate in each case where I had sufficient data (i.e., excluding *0- tokens). On the sex factor however, my data showed exactly the opposite pattern for

both variables ng-/0- and n-/l~. Applying a chisquare test to Yeung's raw token counts confirms there are sex differences with regard to ng-/0- and

n-/l-, with one exception. The data for her *0- tokens do I not meet the critical value. It appears, then, at least for her data the patterns she suggests are accurate. The

differences between our outcomes are most likely a result of differences in data. Recall that Yeung's data in her main study all come from repeating aloud a story given them by the investigator. There are several ways in which data form a story might

differ from my own. Reading aloud is in itself a special behavior that might invoke a distinctly different kind of speech than the speech types that I used. How might that 149

One explanation might be that females and males differ with respect to their story-telling behavior. It is possible that females tend toward overt prestige when preforming in the role of story-teller. As mentioned earlier in this report, some have suggested that women are preservers of the standard. Though evidence from studies shows that women do not always preserve, it is possible that they play the role of preserver in at least situations, perhaps when telling stories. Pan does not give enough raw data to perform a significance test on his results. Therefore it is difficult to evaluate his outcome statistically. Moreover, his means scores do not vary greatly enough to suggest that the difference would be significant.^ Note again some of Pan's results.

Table 39 Sex/Age Group Scores For Conservative-Form Production Across Three Contextual Styles (adapted form Pan 1981: 8)

Group I (GI) Group II (GII) (younger) (older) Averages Women .430 .613 .562 Men .316 .350 .333 Averages .373 .482 150

NOTES TO CHAPTER FIVE

1. There are a total of 3077 tokens of ng-/0- in the third data base. I have classified 2711 as historically velar nasal, 366 are classified historically zero initial. Among the velar nasal group, there are 86 tokens in the questionable category. There are 7 tokens of ngaak ’to cheat' and 79 tokens of ngaam 'right, correct.' (see appendix G)

2. Guangzhou and area includes the majority of my informants. This group covers much of the same territory as Yuan's costal Yue (1960:179) and also Hong Kong and it environs. Northeast Pearl Delta includes Huiyang, Huidong and Huizhou. See 3.2.1 (p.80) for discussion of my groups. Appendix 3 provides a more detailed description of informants' and their parents' birthplaces.

3. Two tokens of *n- were removed from chisquare test because they were the only instances of tokens in the locative class. The test does not give reliable results with cell counts under 5.

Applying the chisquare test to Yeung's data yield the following outcomes:

Age

Variable df Statistic ng/0 (*ng-) 4 133.77 ng/0 (*0-) 4 40.57 n/1 4 105.90 Critical Value = 3.8414

Sex

Variable df Statistic ng/0 (*ng-) 1 19.30 ng/0 (*0-) 1 0.17 n/1 1 68.45 Critical Value = 9.4877 151

Recall that in section 5, table 5.5 the very large mean differences with respect to I0-] frequency for *0- were not statistically significant. CHAPTER VI Conclusions

6.1 Main Findings and their Implications 6.1.1 Variation by Sex Of the main findings of this report on variation in

Cantonese, the results on sex and lexical category are probably the most notable. The results on sex seem to challenge widely held notions about how males and females differ in their use of language. As noted in section 3.2.1,

sociolinguistic literature suggests that females tend toward prestige forms when variation is along prestige/non-prestige

lines, at least when the forms are overtly prestigious. Although conservative forms are not always prestigious.

Pan's study indicates that for the variables under discussion here the conservative forms are in fact the overtly prestige forms as well. The results of both Pan and Yeung are consistent with most reports on sex differences, showing females tending

toward the conservative/prestigious forms. The results of my study show a statistically significant difference in two

of the three variables, ng-/0-and k-/h-. However, my data shows females tend toward the innovative/non-prestige forms.

152 153

The sex factor with the third variable (n-/l) is not significant but it appears that females may tend toward the

innovative/non-prestige form as well. As discussed in

section 5.4, the difference could well be a difference in

the kind of data used. The disparity of the result at least

calls into question overly broad claims either way. Figure

10 below summarizes the data by sex.

0 . B 9 0 . 9 -

O.B - 0.72 0.7 - 0.68 V ^ O . 6 6

0.6 -

£ 0 . 5 -

0 . 3 9 0 . 4 - 0 . 3 B

0 . 3 - 0 . 2 7

0 . 2 - 0 . 1 7

0.1

[0-] Frequency (■^g-) [0-] Frequency (*0) [1-] Frequency [h-] Frequency

[771 Male I W l Female Combined

Figure 10 Variable Frequencies by Sex 154

In both Pan (1981) and Yeung (1980), numbers were quite small and no statistical significance tests were applied.

However, a chisquare test applied to Yeung's raw token ' counts shows her differences to be significant in all but one case: with the *0- initial, [0-J frequency did not correlate with sex (see Chapter Five note 4). Because Pan's raw data were not available to me, I was not able to apply a significance test to his figures.

What is clear from the data here and from the data from other variation reports on Cantonese is that sex is an important factor in Cantonese, as it has been found to be in other languages. Moreover, the data show that at least under some circumstances, female speakers of Cantonese tend toward the innovative forms.

6.1.2 Age as a Factor: Sound Change or Age Grading As has been true for many sociolinguistic studies, age turns out to be a important factor. The data from this study indicate age to be a significant factor in all three variables and that use of the innovative forms to be inversely correlated to age, that is, the older the speaker is the less likely he is to use the innovative form. Though all three variables show age as a significant factor, the critical boundaries are different. With the variable ng-/0- (*ng- initial), there is a

fairly even decrease in [0-J frequency through the first three age groups whereas the third and forth groups are virtually the same frequency. With the *0- initial the 155 large disparity is between the third and forth groups. The n-/l- variable shows a steady decline of [1-1 frequency but with age groups two and three having nearly the same frequency. The k-/h- variable shows an even decrease in use throughout all three age groups. Thus all our variables show a decrease in frequency of the innovative form as age increases, but each patterned somewhat differently. The direction of this pattern is not surprising. If the data here do indeed represent innovation, one would normally expect that innovation to be associated with youth. If the difference is stylistic, one might expect young people to be less sensitive than adults to the subtleties of style and speech register. Consequently, they are likely to exhibit less variation, holding primarily to the forms associated with informal registers that are most common to their daily experience.^ Figure 10 below show the comparative mean frequencies

for the four variables with respect to age. 156

1.00

0.9 - 0.85

0.8 - 1.75

0.7 - ).66 0.65 S

0.6 - 0.51 0.5 -

3.37

0.3 - .28

0.2 -

0.1 O.Ot

[0-] Frequency (*ng-) [0-] Frequency (*0-) [I-] Frequency [h-] Frequency

W\ 1-18 K 3 19-30 31-45 46+

Figure 10 Mean Variable Frequencies by Age

With all three variables age was statistically significant. The crucial question becomes whether the variation by age is a so-called change in progress or an age

grading situation.

As described in the section 5.1 above there was little

interaction between age and the other variables such as education or sex. Labov (1972: Chapter 5) has suggested

that interaction (or crossover in his terminology) is an indicator of a change in progress as opposed to a stable

variable. The lack of a definite interaction pattern for 157

any of the three variables would argue that they are in fact quite stable and primarily stylistic.

Impressionistically, the variable n-/l- would seem to be the most likely of the three to be a change in progress. Only recently have Hong Kong educators said much about the

young people's 'corruption' toward the [1-] variant. If the trend toward [1-] is not new, then why the sudden notice? One explanation for the recent observation is that

only in the last couple of generations has mass education been implemented in Hong Kong. With a quickly growing, mass

enrollment comes students and teachers with a variety of language backgrounds. As a result, formal prescriptive norms are more difficult to enforce. Thus, the change in progress may be an inevitable and growing tolerance for non­

standard forms in formal situations, such as the classroom and the media. As mentioned in section 3.1, Chao reported the existence of n-/l- variation more than forty years ago. My

spot check of a 1940's Hong Kong film also found the [1-] variant. If the /n-/ is changing categorically to [1-], it

seems that it must be much slower than commonly believed. Only a longitudinal study can determine with any certainty

whether the variation is a change in progress or an age grading situation.

6.1.3 Dialect Mixture Because of the many varieties of Chinese spoken throughout China, explanations for variation have often 158 appealed to the possibility of dialect mixture, usually meaning spacial dialects. Often those appeals have not been made on the basis of empirical evidence but by establishing plausibility»

Hashimoto (1972:89,121,143) suggests for example that the tendency of some standard Cantonese speakers (i.e., Hong Kong/Guangzhou) to substitute [0-] for historical *ng- can be attributed to the influence of the Panyu dialect (near

Guangzhou). Likewise, she attributes the reverse tendency to affix a non-etymological [ng-J for historical *0- to the influence of Nanhai dialect.

There are several problems with this approach. First, we cannot be certain from existing reports that the Nanhai and Panyu initials *ng- and *0- are in fact widely realized in the way she suggested. For example, Zhan and Cheung's (1988) recent survey reports Panyu to have no velar nasal initial, but that Nanhai follows historical lines.

Traditional dialect reports are highly dependent on small number of informants and on idealized data. To illustrate further, Zhan and Cheung's 1987-1988 survey does not acknowledge any variation in Hong Kong with regard to the three variables analyzed in this report.

Even if we do acknowledge the reports to be generally true of their respective areas, we would still have to demonstrate the correlation between use of a particular variant and the background of the speaker. It seems feasible that a speaker of a dialect that lacks the velar nasal might tend not to produce it in Cantonese. Some have 159 suggested, for example, that the apparent difficulty the Hong Kong Cantonese have in producing [n-] in English is associated with their inability to produce it in Cantonese.

This is not always the case however. I observed that some of my informants who rarely produced the [n-] in Cantonese neih 'you', produced it in the Mandarin counterpart ni. Likewise, many in fact could produce the tn-] in English but did not in Cantonese.^ Clearly, the issue is not the speaker's ability to produce it, but that do not acquire the phoneme within the Cantonese context.

As noted in Chapter three, dialect reports are not always in agreement concerning the usual realization of the three variables under discussion here. However, most seem to agree that Panyu lacks initial /ng-/ (Hashimoto 1972; Zhan and Cheung 1987,1988). Among my informants were two people who claimed Panyu as their place of origin.^ The first of the two informants (see Appendix C, case 29, female, age 87) was recorded in interview only and used no I0-] for *ng- tokens (i.e., she preserved historical forms throughout). The second informant (case 13, female, age 23) showed higher than average [0-] frequency (group mean = .252), but only in the first two registers ( impromptu = .765, interview = .617). Her score in the public speech register was .256.

More difficult than demonstrating how an individual's other dialect background might influence his or her use of Cantonese, is demonstrating how the pronunciation of a particular village (e.g., Panyu or Nanhai) might influence 160

the standard Cantonese of Hong Kong, either directly or through Guangzhou. Though there is an abundant literature concerning contact related language variation (see Thomason and Kaufman 1988), influence of one variety over another

usually occurs under very special circumstances. To make a case for a large scale influence of Panyu on Guangzhou or

Hong Kong, one would have to present evidence of extraordinary historical circumstances, such as a massive influx of Panyu speakers into Hong Kong. The 1981 census of Hong Kong (Basic Tables; 9-9) shows 1,480,112

(approximately 54%) persons with origins in Guangzhou, Hong Kong and adjacent areas. Unfortunately, no further breakdown is provided.

Again, assuming the Zhan and Cheung report to be widely reflective of the dialects they describe, the variable n-/l-presents a more promising case for dialect

influence. Five dialects (four of them Yue) in close proximity to Hong Kong are reported as lacking the n- initial: Macau, Panyu, Shunde, Jiangmen and Shenzhen. On the

other hand, Hashimoto (1972:120) suggests Nanhai as a source of Hong Kong's variation, while Whitaker (1952:31) suggests Swatou (Shantou) or Hainan as a source. Zhan and Cheung's report however, does not mention any tendency toward [1-] for either Nanhai or Swatou.

My data only show one case of statistically significant correlation between place of origin and a dependent variables. The Guangzhou area and Zhongshan area correlate with [1-] frequency in the sample. Because the Guangzhou 161

area includes many areas that themselves may contain variation with regard to our three dependent variables, the correlation does not tell us too much. Unfortunately, my sample did not include an adequate number of informants from each data point to attempt a statistical analysis.

Within the Guangzhou area group were those who claimed

place of origin to be Hong Kong (8), Guangzhou City (3) Panyu (1), Bao'an (4) , Dongguan (2) and Shunde (2). Of the

six places only Panyu and Shunde are said to lack /I-/ (see figure 11 below). Because Hong Kong itself is included in the Guangzhou area, the correlation is partly to itself. Consequently, the figure becomes less meaningful. However, Hong Kong is also considered an *n-/*l- distinct area by Zhan and Cheung. Potentially more interesting is the correlation to ■Zhongshan. Because the correlation figures represent just

one place and with ten cases (four informants), one might be able to make a case for dialect influence from Zhongshan. Although Zhongshan too is considered a *n-/*l- distinct area

by dialect reports. 162

Lutian *

Conghua # Huaxmn

Zcngchcng Shanshui • Guangzhou

oshan # # I Muizhou » e Dongguan # Nanhai Gaoming ■ Shunde Heshan # Bao'an Jiangmen d Shenzhen Xinhui* , \ • Zhongshan Longdu Kaiping Shanghcn Nanlang / Enping # e # Zhuhai Ilong * Taishan Kong I Doumcn Macau

Key

*ng-/’ 0- distinct • ng-only ▲

0- only ■

*ng-/*0- mixed 3

Figure 5 Dialect Map of the Pearl River Delta: n-/l- Distribution 163

What becomes clear in our discussion here is that one really cannot make a firm hypothesis about dialect influenced based on the type of reports available to us at this point. Moreover, my own data is not suitable in clarifying the question. The conflicting reports on the

three variables under discussion here, suggest only that

variation in the Pearl Delta is indeed prevalent. What the pronunciation norms are in any given location is difficult to establish. As far as the other two variables ng-/0- and k-/h- are concerned there was neither statistical correlation nor any suggestive patterns. Of course, the lack of statistical correlation does not prove that there is none, but that the correlation was not born out in the data at hand. Small

numbers of speakers made the correlation difficult to establish.

Influence of one variety of a Chinese over another is certainly a viable concept and the role of contact in language change is well documented.^ Can our variables here be explained in term of dialect contact? The data from this and other studies (Yeung, Pan) do not support the

claims that have been made.

6.1.4 The Lexicon and Variation Another interesting result of this study concerns the effect of word class on the three variables. It was not an original goal of my investigation to study independent linguistic factors. However, in the exploratory stages of 164 my research I detected what seemed to be a large difference between frequency of [1-] initial in the demonstrative morpheme ni- 'this' and its frequency in other words. In response to this difference, I calculated frequencies for demonstratives and a number of other word classes. The results here indicate that there is indeed a difference

between the demonstrative category and the others. Let us look at the distribution by word class once again.

0,9 - 306 132 103 O.B - 487

0.7 - 654 2361

& c 0.6 - O’ « 0.5 -

0.4 - 585

0.3

0.2 -

0.1

verb sv odv noun pro dem port Combined

Word Cloaa

Figure 8 [1-J Frequencies by Word Class (*n~) 165

In the case of the demonstrative, there is a much lower occurrence of the [1-] than for the sample as a whole. Also showing low frequency of II-] is the particle class. The results for the particle class should be taken as more tentative though, since it is represented by just 9 tokens.

Nevertheless, the disparity is great enough so that it merits note. All demonstratives in Cantonese, affix one of two morphemes: ni- or go-. Since only ni- involves the variable n-/l~, it would be somewhat misleading to assert that the word class of demonstrative as defined by one morpheme is a factor in n-/l- variation. Therefore, one should be cautious in making any claims concerning word class on the basis of ni- alone.

I am not able to offer any definitive explanation for the tendency of ni- to preserve initial *n-. However, it is possible that phonological environment plays a critical role with ni-. Because there are no common words in Cantonese that share the same pronunciation as ni- , it is difficult to compare homophonous words or segments. Huang's (1970) dictionary gives only one other word, the verb ni- 'to hide.' However, there are no occurrences of ni- 'to hide' or any other homophonous character in my data base. In contrast, words such as nihn /nin/ 'year' and nihm /nim/ 'to commemorate' were widely realized with initial [1-1. For example, nihn was realized 77 times with n- initial and 109 times with 1- initial (see appendix E). 166 One possible explanation for the special behavior of the demonstrative affix ni- might concern the tendency of deities to be found in stressed position within a sentence. Since I did not transcribe for stress pattern in this study, I can not offer any conclusions on the question of stress environment at this point. A reanalysis of the tapes with stress in mind would be an interesting avenue for follow-up study.

That the class of particles behaves somewhat differently from the other word classes in the language is not too surprising if one considers that particles are anomalous in many other ways. They are more likely to be realized in a neutral tone and tend to operate on a different syntactic and discourse level than other words. Predictably, then, the small group of particles (9) in my data base behave differently than most of the other word classes. What is a little more surprising is that all were realized with n- initial. Pinal utterance particles in Cantonese tend toward reduction and, in fact, it has been argued that many particles developed from contraction of two separate particles.® However, all nine particles were occurrences of the rhetorical question particle ne, which unlike many other final particles, is often stressed. Because most final utterance particles in Cantonese are colloquial, it is usually not possible to appeal to historical forms to establish a base form for them. However, there are few that ever occur with initial [n-]. In contrast, there are many particles that are exclusively 167 realized with [1-1 initial.* Besides the question particle ne, I could not reliably classify any other particles as n- initial. As a result, with the n-/l- variable, the class of particle is really limited to the one particle ne as far as this study is concerned. Consequently, as with the affix ni- any claims concerning word class based on ne alone would be misleading. Word class was also a factor with ng-/0- variable, though the word classes most prominently at variance were different than with n-/l-. In the case of ng-/0-, stative verbs were less likely to have a tO-I initial, whereas verbs and adverbs were more likely to be realized with [0-].

Note again figures for the variable ng-/0- below. 168

0.8

0.7 -

0.6 -

5“ 0.5 - C 3%> C £ 0.4 -

ï 0.3 - 204 2356 2711 0.2

odv veib noun pro sv Combined Word Class

Figure 7 [0-] Frequency by Word Class (*ng-)

Many of the words in my corpus do not appear to be

phonologically conditioned in the conventional sense. For

example, the word ngoh 'I, me' may be variously realized as

either /ngo/ or /o/ in a single context (say, sentence

initial) and by the same speaker. Nevertheless, it is possible that words with the *ng- initial may tend toward a particular realization depending upon phonological or syntactic environment. Since I do not carry out a variable

rule-type analysis in this study, I make no definite 169 conclusions concerning my data with respect to a words' tendency to be realized in one way or another. Patterns do seem to be suggestive, however, and to demand further analysis.

I did observe what seems to be preservatory assimilation with the word kwaahn-naahn 'difficulty*. There were nine instances of kwaahn-naahn and each time the morpheme naahn was realized with initial [n-1. In addition, n- initial seemed to be preserved in syun-neuih 'granddaughter' (8 times as [n-] and once as (1- 1) (see Appendix E).

The great majority of speakers had tokens of each variant of both the ng-/0- and the n-/l-variable. Though with the k-/h- variable, there were a number of speakers who preserved the [k-] form throughout, still many had mixture of the two variants^ .

Another restrictive point of variation concerns the k-

/h- in the third person pronouns keuih 'he/she' and keaihdeih 'they'. With these two forms sex, speech register and age are all significant factors. What is most striking, however, is that the variation seems exclusive to the pronouns and was not found in other words of similar form. For example, deih-keuih 'area' is never pronounced deih-heuih in any of the data, nor have I noted it realized as so elsewhere.

Pronouns and other high frequency words tend to behave somewhat differently from other forms and are especially susceptible to reduction. Examples of this phenomenon can 170

be found in the pronoun systems of other languages as well. Note, for example, the restrictive reduction in the American English contraction w e ’ll for we will often read /wil/. By contrast, the homophone wheel is rarely if ever pronounced

/wil/. Another example can be found in Canadian French with

respect to je sais ’I am' which is commonly contracted to

,3 'uis. On the other hand, the homophonous je suis 'to follow' is not contracted. . Variation according to word class has potential implications for the theory of lexical diffusion. Some of the variation between word classes in this study can

potentially be explained by phonological environment. Still, there are several disparities that deserve closer study with respect to the lexicon. Particularly

interesting is the high [0-] frequency for adverbs and nouns on one hand and the much lower [0-] frequency for stative verbs and verbs on the other for words with *ng- initial. A dedicated study on environmental factors would help solve the question.

6.1.5 Speech Register as a Variable Figure 11 below illustrates the effect of register on the three variables. 171

0.9 - 0.B3 0.B0 0.B -

& 0 . 7 - 5 0 . 5 6 0.6 - 0 . 5 4

0 . 5 - "C 0.37 0.37 5 0.4 - 0.3S 0.37

0.3 - 0.22 m 0.2 -

0.09 0.07

[0-] Frequency ("ng-) [0-] Frequency (*0-) [I-] Frequency [h-1 Frequency

1/ /I Impromptu IX n J Interview Public Speaking

Figure 11 Mean Variable Frequencies by Speech Register

With the exception of those words involving initial

*0-, all the variables significantly correlate with speech register, though the crucial boundaries are different for

each. With *ng- initial the innovative form decreases evenly from impromptu to public speaking style. With the n-/l-

variable, the critical difference is between impromptu

speech and interview on one hand and public speech on the other. The k-/h- variable shows yet another pattern. There is virtually no difference in the frequency of the 172 innovative [h-] between impromptu and interview, but the public speaking register shows a markedly lower frequency of [h-].

I have implied throughout this report a continuum of sorts from impromptu (least formal) to most formal (public seeking). If we judge formality by propensity toward conservative forms, the continuum seems justifiable.

Nevertheless, it is not crucial we place these speech types on a continuum. The important issue here is that all three situations are examples of legitimate speech events and that speakers of Cantonese react differently to each of those events with respect to their usage of the phonological variables under study here.

Certainly, there are numerous speech situations and speech registers that might be considered along with the three examined here. Furthermore, there are speech events within speech events. For example, within all three speech types there were narrations, within some of the public speaking segments people read aloud. Conceivably, narration and recitation could be considered as variables and be correlated with phonological and other dependent variables.

6.2 Other Notes on Outcome In addition to the main findings discussed above, there are several results that should be noted for their non­ significance. Education is one factor that was insignificant with each variable but might have been 173 expected to play a role in the stratification of the linguistic variables. Note the summary graph below.

0.9 -

O.B - 0.73 0.70 0.70 0.7 - 0.60 V 0.6 - V3 V 0.49 0.4B 0.5 - JA.« O ■c 0.4 - 5 0.30 0.29 0.30 0.3 - 0.23 0.20

0.2 - 0.15

. 1 ^ [0—] Frequency ) [0—] Frequency (*0) [I-] Frequency [h -] Frequency

1771 0-6 rvT^ 7-12 U77k 13+

Figure 12 Mean Variable Frequencies by Education Level

Education is often used in sociolinguistic studies as an indirect measure of social status. It is more easily measured and defined than social indices. Unfortunately, education level is not always an accurate reflector of 174 social status, especially in a rapidly developing societies such as Hong Kong's. With the rapid implementation of mass education in Hong Kong, children of many different socio­ economic backgrounds have been integrated into the education system in the last 40 years. Also, unlike North America, there is a wide range of options for primary and secondary schools. Because these schools vary widely as to their prestige, factoring in the type of school students attend might have been useful. One other problem in using a simple 'years of education' measure for education is that it was,difficult in practice to measure the educational levels of older

informants. Many well-educated people of the oldest group

(particularly those over about 60) did not participate in a formal educational experience, but worked with tutors. Moreover, females from poor or well-to-do families often did not have formal training but were nevertheless educated at home. It is difficult to compare educational levels under different situations, and more importantly, that measure is not a consistent measure of social status. Two other potentially interesting factors that turned out not to be critical relative to the data, both have to do with the influence of the interviewer. Recall from section 4.3.1 that the interview data were collected by two interviewers, an assistant and myself. There are a number of personal attributes that might reasonably be expected to influence the speech of the informants: 175

Interviewer #1: early 30's, male, non-Chinese, researcher status Interviewer #2: late 20*s, female, native Hong Kong, assistant status (non-formal academic affiliation)

The potential influences of the interviewer's background are virtually endless and extremely complex. At this point, I have only sought to capture gross effects. To that end, I coded all interview data for interviewer. Note table 37 ■ and figure 13 below.

Table 37 Mean Frequencies by Interviewer

•ng [0-1 Frequency Interviewer N Mean 1 16 .307 2 9 .289

•0 I0-1 Frequency

Interviewer N Mean 1 8 .860 2 3 .577

I1-1 Frequency Interviewer N Mean 1 16 .557 2 9 .498

Eh-1 Frequency Interviewer N Mean 1 16 .286 2 9 .350 176

0.9 - 0.B6

O.B -

0.7 -

0.6 - 0.5B

0.50 0.5 -

0.+ - 0.35 0.31 0.29 0.29 0.3

0.2 -

I Jr z z 4 ^ \ v i ____ zz4\\x [0 -] Frequency (*ng-) [0 -] Frequency (*0) [I-] Frequency [h-] Frequency

1 / / I Interviewer l(db) I W l Interviewer 2 (cff)

Figure 13 Mean Variable Frequencies by Interviewer

Variable N F-RatioProbability

*ng [0-J Frequency 46 1.43 0.246 *0 10-] Frequency 11 3.30 0.052 I1-J Frequency 25 0.35 0.704 Ih-] Frequency 25 1.25 0.292

With the exception of the data for *0- tokens, all of the means are quite close and none of the differences is 177 statistically significant. The results of this simple breakdown do not demonstrate that the backgrounds of the interviewer are unimportant (cf Walters), but that they do not appear to grossly influence the variables under consideration here.

6.3 The Group and the Individual Most of what I have presented in this study concerns groups tendency and not individual tendency, though individual frequencies are provided in the appendices. I have primarily sought to describe the speech community as a whole. Sometimes individuals varied widely from the group patterns described in chapter five. For example, one 22 year old, male informant (informant #18) had almost no occurrences of [1-] in either of the two registers in which he was recorded (impromptu = 1/17 or .059; interview 3/21 or

.143). From our group data, we would predict that a male speaker with his background characteristics would have a much higher number of [1-] occurrences (.502).*

In the case of the informant above (#18), detailed knowledge of his background helps to explain his apparently anomalous behavior. During primary school the informant was identified as having a speech problem and was subsequently provided with remedial speech training. Presumably, he was guided in his training by formal prescriptive norms which may account for his tendency to employ those norms more widely that would be expected for someone of his background and age. 178

Not all individual variation can be explained so readily. Moreover, some anomalous cases may correlate systematically with variables not dealt with in this study. There are numerous possible explanations for within group variation, some of which are discussed in the next section as areas for future research. No human behavior is simply explained and one can not expect any group profile to entirely predict individual language behavior. The number of variables are too great. Nevertheless, it is clear from our data here that group patterns do exist. The most difficult task is to refine and explain the patterns that have emerged. Though I have attempted to resolve some of the questions, many remain for future work.

6.4 Future Directions

We have found here that females, young people and people employing an informal speech register tend to use innovative and (arguably) non-prestige variants. Having established some definite patterns, we.might now ask the questions: What do these attributes have in common if anything? Are they proxy variables (cf. Milroy 1987:101) for some other common attribute such as deference? What is

the origin of the three innovations? Did they arise out of contact with other dialects with similar variation or did

they develop independently? Once a part of Hong Kong Cantonese, how did the variables pattern the way they did? It is important to emphasize that we have only established correlation. Correlations do not by themselves 179 explain the variation, but only lead us to ask useful questions. Researchers have only begun to explore the intricacies of variation in Chinese. In the course of this study I encountered countless interesting questions, most of which could not be addressed in this report.

The data in this study clearly show age to be a factor in variation of all three variables. Having established age

as a factor, we need to answer the question that follows from the result. That is, does the age correlation indicate a change in progress or an age grading situation. A longitudinal study using a similar framework and data would help to answer the question. I plan to do periodic

recording of as many of the same informants as possible over the next ten years as time permits and compare those results with this study. Because of personal relationships with

many of our informants, the plan should be feasible. The evidence concerning sex as an independent variable

warrant further investigation. It seems that variation by sex cannot be easily generalized and that, in Chinese, women sometimes innovate and sometimes not, depending on the situation and the variable.

Another possibility is that the variation is better explained by factors related to sex such as a person's wish to defer, which itself may correlate with sex.* Th:; concepts of social network (a la Milroy 1980) and pycho- social space (Milroy 1987:35-36) may also come into play. Listening to tapes of my assistant as she interviewed various informants suggested that accommodation to the 180 interviewees was partly through the choice of one variant over another. For example with older speakers she often used the /n/ variant, while with younger people not at all.

All these factors are potentially interesting avenues of study. The issue of accommodation presents a potential methodological conflict. I argue in Chapter Four that both my assistant and I were inside participants and that we were unlikely to affect the data in an unnatural way. However, that is not to say we do not affect the data. Participants by definition are part of the speech situation. Consequently, we would expect that we, as anyone, would be part of the speech dynamic. While it seemed clear that my informant accommodated to her interviewees, we do not know how much the interviewees accommodated to her. It is quite likely they did accommodate to some extent. It is important to remember though, that accommodation is part of any speech situation. It is one of many variables affecting speech production. It is also a variable that was not formally accounted for in this study and as such is part of the unexplained variation. Moreover, the issue of accommodation is only relevant in the interview portions of the data. The investigators were not usually involved in either the impromptu speech context or in the public speech portion except in a very general way as part of an audience. The problem of accommodation is an important one and would be an another interesting topic of future research. 181

Because my data concerning place of origin was limited, I was not able to make any strong conclusions concerning the influence of sub-dialect and/or ethnicity.

One possible avenue of future study would be to focus on just the issue of dialect influence by taking a balanced sample of a larger range of Pearl River Delta dialect points and more meticulously correlating sociolinguistic variables with place of origin.

Although it was not a principal goal of this study to the effects of linguistic factors on the three variables, it is quite possible that linguistic environment plays an important role in variation (cf. Barale 1982). Reports of variation in other Chinese dialects such as those noted in chapter three suggest that a particular phonological environment may favor one variant over another when there is no clearly conditional environment. In addition, syntax and morphology may play a part in the realization of a given variant. The one linguistic variable examined in this study, word class, proved to be an important factor in variation. Other linguistic variables no doubt play a role as well and there remain many possible avenues for follow-up study. 182

NOTES TO CHAPTER SIX

1. My own daughters, ages 3 and 6 by the time of our departure, never acquired the conservative forms at all. Their Cantonese environment was essentially kindergarten and neighborhood play.

2. The experience of my daughters is also interesting with respect to the role of language contact in variation. Though as English speakers they had no difficultly producing [n-], they did not produce it at all in Cantonese.

3. Place of origin usually follows father's background. Both informants claiming Panyu as their place of origin did so on the basis of father's birthplace. None of my informants had a mother born in Panyu.

-4. See Thomason and Kaufman (1988) for a survey of contact literature.

5. Y.R. Chao (1968) among others suggests that the final utterance particle ga is a fusion of ge 'declarative particle' plus the particle a, presumably the emphatic particle. For a discussion of final utterance particles in Cantonese, see Bourgerie (1987), Kwok (1984) and Luke (1988).

6 . Kwok (1984) lists the following initial I1-] particles among others : la inchoative, perfective (cf. Mandarin le) lo intensification le reminds listener of something said before

She lists only the question particle ne as tn-] initial, noting that it is frequently pronounced as /le/ (p.14). Though I have also noticed the alternate pronunciation of rte, it was realized each time as /ne/ in my data base. 183

7. The table below illustrates the numbers of informants who had all innovative or all conservative forms. *ng- #0 — *n~ *k- All innovative 3 5 10 4 All conservative 14 6 4 18 Mixture 24 27 35 27

* The above numbers do not include informants who had less than the minimum token count of 10 for a given variable.

8 . The [1-] frequency for the age, education and sex of informant #18 is .502 (from Table 29). The overall figure for a male of his age group is .658.

9. Timothy Light suggests yet another possibility: that our three variables may be a part of a phonologically encoded honorific system, with the conservative forms being honorific. APPENDIX A

Subject Background Data Sheets:

Chinese and English Versions

184 185

Chinese Version of Background Data Sheet

______m i s ______

_____

(#/M) ©MitUIA (t/SS/m)

i.emig 3.^lil#g 4. f f # # 5. f ÜJ#g Æ # ) 8. E7 X m 9. ## a # 11. % m

/]'#? 4^#? ;k#?

/j'#? (##? A m

m * 186

English Translation of Background Data Sheet

Name:______Telephone Number: ______Sex: Age:______

Birthplace (Province/county/city): ______If not born In Hong Kong, How long have you been here? Father's Birthplace (Province/county/city): ______Mother's Birthplace (Province/county/city): ______Education Level (in years) ______Father's Education Level (in years) Mother's Education Level (in years) Occupation: ______Father's Occupation ______Mother's Occupation ______If [parents] are deceased or retired, what did they do formally? Besides Cantonese, can you speak another Chinese dialect? Yes ___ No ___ If yes, please circle the appropriate number below. 1. Mandarin 2. Shanghaiese 3. Taishanese 4. Xinhui 5. Zhongshan (Shegi) 6 . Chaozhou 7. Kejia 8 . Shansui 9. Shunde 10. Dongguan 11. Other (Please indicate) ______

Which dialect to you speak with your family at home? Did you/do you attend a Chinese language school or an English Language school? Primary school? Middle school? Secondary school? What Language did/does your teacher use in school to teach?

Primary school? Middle school? Secondary school? What languages do you speak besides Chinese? Have you ever lived outside of Hong Kong? If so, where? ______How long ______? What did you do there? Study _____ Work Other (please specify) ____ Case # ____ Social Group ______Tape #_____ APPENDIX B

Interview Questions Chinese and English Versions

187 188 Chinese Version of Interview Questions

An.#?

%{#

% w 189

# A # # 190

English Translation of Interview Questions

Personal Background

How many people are there in your family? What are their ages?

Are you married? Do you have children? How old are they? Hong Kong Life

Where in Hong Kong were you born? Where in Hong Kong did you grow up? Where do you live now? What are the good things about living where you do now? What are the bad points or inconvenient points about where you are living now? What needs improvement in the area where you live? Do you feel that living in Hong Kong is enjoyable? How do you get along with your neighbors? Besides the place that you are living now, have you lived elsewhere in Hong Kong? What do like most about Hong Kong? What do you like least? Which street do you feel is the most exciting in Hong Kong? What place in Hong Kong do you like the most?

Hong Kong Change Do you think that Hong Kong has changed a lot in the past few years? Do you feel that Hong Kong has improved or deteriorated recently? Do you pay attention to city government? These days many people are talking about the '97 question, how do you feel about it? Do you sometimes discuss it with your friends? How does your family feel about it? How do you think that the situation in Hong Kong will change? Are you concerned that the economy will deteriorate? Do think that Hong Kong will be as free after '97 as it is now? Are you familiar with the White Paper? Do you feel that the White Paper is fair to Hong Kong? Do you feel that the mainland government will honor the provisions of the White Paper? Do you feel that direct elections should be held this year or not? why? 191

Work

What sort of work do you do? Do you enjoy your work? Where do you work? Do you go to work by bus, subway or car? Is that a convenient way to get to work?

Education Where do you study? Where have you studied in the past? What courses do you/did you study? After you graduate, what kind of work do you hope to do? What do you do in this kind of work? How long do you have to study for this kind of work?

(If the person is no longer in school, past experiences can be discussed)

Philosophy of Life From you perspective, what is a successful person like? If you could change something in you life for example, study again or change jobs what would you do? If you had a lot of money that is, as much as you wanted- what would you use the money for? APPENDIX C Individual Data and Results (Main Data Base)

192 193

Bxplanations to Appendix C

Most of the headings for Appendix C are self-explanatory. For those that are not, explanations are provided below.

CASE The identifying case number represents the individual informants. •Hie number after the decimal point represents the level (i.e., speech register) of the particular recorded portion. Thus, case 18.2 is the data for informant number 18 recorded at the second level, the interview level. Reg=Register l=impromptu speech 2=interview 3=public speaking Birth Pl.=Birth place

Edu. Level=^ducatiai Level To nearest year Father Bd.=Father's Bducaticmal Level To nearest year Mother Bd.=Mother's Educational Level To nearest year

School Language/Home Language cCantonese mpMandarin e=English

Interviewer 0=the author l=assistant 2=no interviewer for this segment, i.e, not interview data Social Group 0=church associate l=work 2=other

Outcome Data

The outcome data is grouped into five groups: by all ng-/0- data combined, by words with *ng- initial, by *0- initial, by *n- and by *k- initial. Several figures are given for each group: the number of instances for each of the two variants, the total token words for the variable and a frequency figure the innovative form (i.e., occurrences of innovative variant over the total number of token of that variable. 194 CASK RBB. SEX AGE Birth Father Mother Edu. Place Birthpl. Birthpl. Level

1.1 1 m 37 Hong Kong Kaiping Kaiping 9 1.2 2 m 37 Hong Kong Kaiping Kaiping 9 1.3 3 m 37 Hong Kong Kaiping Kaiping 9 2.1 1 m 32 Hong Kong Qingyuan Shunde 7 2.2 2 m 32 Hong Kong Qingyuan Shunde 7 2.3 3 m 32 Hong Kong Qingyuan Shunde 7 3.3 3 m 35 Hong Kong Kong Kong Bao'an 16 4.3 3 f 21 Hong Kong * * 11 5.1 1 m 34 Guangdong Guangdong Guangdong 16 5.2 2 m 34 Guangdong Guangdong Guangdong 16 5.3 3 m 34 Guan^ong Guangdong Guangdong 16 6.1 1 f 26 Hong Kong Guangzhou Bao'an 12 6.2 2 f 26 Hong Kong Guangzhou Bao'an 12 6.3 3 f 26 Hong Kong Guangzhou Bao'an 12 7.1 1 m 33 Hong Kong Hong Kong Hong Kong 14 7.2 2 m 33 Hong Kong Hong Kong Hong Kong 14 7.3 3 m 33 Hong Kong Hong Kong Hong Kong 14 8.1 1 m 57 Guangzhou Xinhui Xinhui 4 8.2 2 m 57 Guangzhou Xinhui Xinhui 4 8.3 3 m 57 Guangzhou Xinhui Xinhui 4 9.2 2 m 40 Jiangsu Jiangsu Jiangsu 8 9.3 3 m 40 Jiangsu Jiangsu Jiangsu 8 10.1 1 f 18 Hong Kong Yingde Hong Kong 11 10.2 2 f 18 Hong Kong Yingde Hong Kong 11 10.3 3 f 18 Hong Kong Yingde Hong Kong 11 11.3 3 m 22 Hong Kong Gaoming Guangzhou 11 12.1 1 f 26 Hong Kong Taishan Taishan 12 12.2 2 f 26 Hong Kong Taishan Taishan 12 12.3 3 f 26 Hong Kong Taishan Taishan 12 13.1 1 f 23 Hong Kong Panyu Xinhui 15 13.2 2 f 23 Hong Kong Panyu Xinhui 15 13.3 3 f 23 Hong Kong Panyu Xinhui 15 14.1 1 m 44 Hong Kong Dongguan Dongguan 6 14.2 2 m 44 Hong Kong Dongguan Dongguan 6 14.3 3 m 44 Hong Kong Dongguan Dongguan 6 15.1 1 m 37 Hong Kong Nanhai Dongguan 10 15.2 2 m 37 Hong Kong Nanhai Dongguan 10 15.3 3 m 37 Hong Kong Nanhai Dongguan 10 16.1 1 f 36 Hong Kong Shunde Siuhung 8 16.2 2 f 36 Hong Kong Shunde Siuhung 8 17.1 1 m 36 Hong Kong Sansui Hong Kong 6 17.2 2 m 36 Hong Kong Sansui Hong Kong 6 17.3 3 ID 36 Hong Kong Sansui Hong Kong 6 18.1 1 m 22 Hong Kong Wai Yang %ii Yeung 9 18.2 2 m 22 Hong Kong Wai Yang Wai Yeung 9 19.3 3 £ 27 Hong Kong Bao'an Guangdong 11 20.1 1 m 27 Hong Kong * * 9 20.3 1 m 27 Hong Kong * * 9 21.2 2 f 34 Hong Kong Hong Kong Dongguan 11 22.1 1 m 30 Hong Kong Xinhui Xinhui 14 22.2 2 m 30 Hong Kong Xinhui Xinhui 14 23.1 1 f 35 Zhongshan Shunde Zhongsha 0 195

CASE REG. SEX ABE Birtii Father Mother Bdu. Place Birthpl. Level

23.2 2 £ 35 Zhongshan Shunde Zhongshan 0 23.3 3 f 35 Zhongshan Shunde Zhongshan 0 24.2 2 f 83 Hong Kong Dongguan ** 25.2 2 f 27 Hong Kong Bao'an Bao'an 12 26.2 2 m 83 Shantou ** * 27.2 2 f 36 Macau Xinhui Guangzhou 11 28.2 2 f 13 Hong Kong Gongnuhn Longmen 8 29.2 2 f 6 Hong Kong * * 2 30.2 2 f 87 Hong Kong Panyu ** 31.2 2 m 16 Hong Kong Whijau Hong Kong 10 32.2 2 m 14 Hong Kong Hong Kong Zhongshan * 33.1 1 m 23 Hong Kong Guangzhou Guangzhou 17 34.1 1 m 23 Hong Kong Hong Kong Hong Kong 16 35.1 1 m 35 Hong Kong Hong Kong Hong Kong 14 36.1 1 m 36 Hong Kong Chaozhou Chaozhou 7 36.2 2 m 36 Hong Kong Chaozhou Chaozhou 7 36.3 3 m 36 Hong Kong Chaozhou Chaozhou 7 37.2 2 f 59 Xinhui Xinhui Xinhui 5 38.2 2 f 12 Hong Kong Zhongshan Zhongshan 7 39.2 2 f 15 Hong Kong Hong Kong Zhongshan 9 40.1 2 f 35 Hong Kong Bao'an Hong Kong 13 41.2 2 f 25 Hong Kong Zhongshan Zhongshan 7 42.2 2 f 32 Hong Kong Zhongshcin Zhongshan 10 43.1 1 f 87 * * ** 44.1 2 m 86 Guangzhou Guangzhou Guangzhou * 45.2 2 m 13 Hong Kong Hong Kong Macau 7 46.1 1 f 85 Zhongshan Zhongshan Zhongshan * 47.2 2 m 7 Hong Kong Hong Kong Hong Kong 3 48.2 2 m 76 Hong Kong Bao'an Bao'an 4 49.2 2 m 15 Hong Kong Hong Kong Hong Kong 9 196 casB Father's Mother's School Home other Int. Social B3u. B3u. Lang. Lang. Dialects Groi^

1.1 6 3 c c none 2 0 1.2 6 3 c c none 0 0 1.3 6 3 c c none 2 0 2.1 6 0 c c none 0 0 2.2 6 0 c c none 0 0 2.3 6 0 c c none 2 0 3.3 3 0 c,e c,h Kejia 2 0 4.3 6 0 c,e c none 2 0 5.1 9 6 c c Mandarin 1 0 5.2 9 6 c c Mandarin 1 0 5.3 9 6 c c Mandarin 2 0 6.1 6 0 c,e c none 0 0 6.2 6 0 c,e c none 0 0 6.3 6 0 c,e c none 2 0 7.1 9 3 c c none 2 0 7.2 9 3 c c none 0 0 7.3 9 3 c c none 2 0 8.1 10 0 c c Mandarin 2 0 8.2 10 0 c c Mandarin 0 0 8.3 10 0 c c Mandarin 2 0 9.2 0 0 c c Mand,Shanghai 1 0 9.3 0 0 c c Mand,Shanghai 2 0 10.1 6 0 c,e c Kejia 2 0 10.2 6 0 c,e c Kejia 0 0 10.3 6 0 c,e c Kejia 2 0 11.3 3 0 c c none 2 0 12.1 2 0 c,e c Taishan 1 0 12.2 2 0 c,e c Taishan 1 0 12.3 2 0 c,e c Taishan 2 0 13.1 3 0 c,e c none 2 0 13.2 3 0 c,e c none 0 0 13.3 3 0 c,e c none 2 0 14.1 6 6 c c Dongguan 2 0 14.2 6 6 c c Dongguan 0 0 14.3 6 6 c c Dongguan 2 0 15.1 0 0 c c none 2 0 15.2 0 0 c c none 0 0 15.3 0 0 c c none 2 0 16.1 3 0 c c none 0 0 16.2 3 0 c c none 0 0 17.1 6 9 c c none 2 0 17.2 6 9 c c none 0 0 17.3 6 9 c c none 2 0 18.1 0 0 c c none 0 0 18.2 0 0 c c none 0 0 19.3 9 6 c c none 2 0 20.1 * * c c none 2 0 20.3 * * c c none 2 0 21.2 7 0 c c none 0 0 22.1 * * c,e c Mandarin 2 0 22,2 * * c,e c Mandarin 0 0 23.1 6 0 N/A c none 2 0 197

CASE Father's Mother's School Home other Int. Soci< B3u. Bâu. Lang. Lang. Dialects Gcoi

23.2 6 0 N/A c none 0 0 23.3 6 0 N/A c none 2 0 24.2 * * * c Dongguan 1 2 25.2 6 0 c,e c none 0 0 26.2 * * * c Chaozhou 1 2 Mandarin 27.2 10 0 c,e,m c Mandarin 1 0 28.2 * * c c none 1 2 29.2 ** c c none 1 0 30.2 * * ** none 1 2 31.2 0 4 c,e c Kejia 0 0 32.2 7 0 c,e c none 1 0 33.1 8 0 c c * 2 1 34.1 * * c c Kejia 2 1 35.1 * * c,e c Mandarin 2 0 36.1 0 0 c c Chaozhou 2 0 36.2 0 0 c c Chaozhou 0 0 36.3 0 0 c c Chaozhou 2 0 37.2 3 0 c c Xinhui 0 0 38.2 0 1 c c none 0 2 39.2 11 0 c,e c none 0 0 40.1 5 0 c,e c Kejia 1 0 41.2 5 1 c c none 0 0 42.2 0 0 c,e c none 1 0 43.1 ** * c * 1 2 44.1 * * * c none 1 2 45.2 11 11 c c none 1 0 46.1 * * ** none 1 2 47.2 * * c,e c none 1 2 48.2 0 0 c c none 1 2 49.2 * * c,e c none 0 1 198

CASE (ng-1 [0-1 ng- [ng-] freq [ng-1 (0-1 *ftg- takens tokens

1.1 30 10 40 0.250 25 1 26 1.2 33 7 40 0.175 32 3 35 1.3 36 0 36 0.000 36 0 36 2.1 6 4 10 0.400 6 2 8 2.2 18 5 23 0.217 17 2 19 2.3 66 4 70 0.057 64 1 65 3.3 40 0 40 0.000 38 0 38 4.3 27 41 68 0.603 11 34 45 5.1 23 2 25 0.080 21 0 21 5.2 30 2 32 0.063 29 2 31 5.3 68 1 69 0.014 68 1 69 6.1 2 10 12 0.833 2 9 11 6.2 58 27 85 0.318 18 27 45 6.3 28 7 35 0.200 28 7 35 7.1 5 6 11 0.545 5 5 10 7.2 19 11 30 0.367 18 6 24 7.3 21 3 24 0.125 21 2 23 8.1 23 0 23 0.000 22 0 22 8.2 33 0 33 0.000 33 0 33 8.3 39 0 39 0.000 37 0 37 9.2 48 5 53 0.094 46 2 48 9.3 64 0 64 0.000 55 0 55 10.1 2 21 23 0.913 2 17 19 10.2 13 45 58 0.776 13 33 46 10.3 11 7 18 0.389 11 7 18 11.3 50 16 66 0.242 56 12 68 12.1 3 7 10 0.700 3 7 10 12.2 12 12 24 0.500 12 5 17 12.3 42 12 54 0.222 42 10 52 13.1 10 37 47 0.787 8 26 34 13.2 51 23 74 0.311 23 37 60 13.3 22 14 36 0.389 32 11 43 14.1 10 0 10 0.000 8 0 8 14.2 20 0 20 0.000 19 0 19 14.3 27 1 28 0.036 27 0 27 15.1 43 17 60 0.283 39 2 41 15.2 46 10 56 0.179 43 3 46 15.3 17 6 23 0.261 29 0 29 16.1 16 1 17 0.059 15 1 16 16.2 35 0 35 0.000 33 0 33 17.1 24 13 37 0.351 14 6 20 17.2 30 2 32 0.063 30 2 32 17.3 43 8 51 0.157 44 1 45 18.1 24 9 33 0.273 21 9 30 18.2 9 25 34 0.735 22 22 44 19.3 14 7 21 0.333 13 3 16 20.1 50 7 57 0.123 49 1 50 20.3 9 20 29 0.690 18 5 23 21.2 20 7 27 0.259 20 6 26 22.1 32 7 39 0.179 0 ** 22.2 31 5 36 0.139 31 2 33 23.1 6 11 17 0.647 5 10 15 199

CASE [ng-1 [0-1 ng- [ng-1 freq (ng-1 (0-1 tokens tak<

23.2 33 19 52 0.365 34 8 42 23.3 45 8 53 0.151 45 2 47 24.2 29 4 33 0.121 31 2 33 25.2 11 2 13 0.154 11 2 13 26.2 30 2 32 0.063 32 0 32 27.2 39 25 64 0.391 39 19 58 28.2 4 14 18 0.778 4 12 16 29.2 9 20 29 0.690 9 20 29 30.2 20 1 21 0.048 20 0 20 31.2 16 21 37 0.568 16 19 35 32.2 7 11 18 0.611 7 11 18 33.1 44 23 67 0.343 44 16 60 34.1 4 2 6 0.333 4 2 6 35.1 7 2 9 0.222 9 0 9 36.1 12 5 17 0.294 12 0 12 36.2 26 8 34 0.235 24 0 24 36.3 12 1 13 0.077 12 0 12 37.2 31 1 32 0.031 31 0 31 38.2 0 14 14 1.000 0 12 12 39.2 8 6 14 0.429 7 5 13 40.1 44 21 65 0.323 42 18 60 41.2 41 20 61 0.328 40 17 57 42.2 30 0 30 0.000 29 0 29 43.1 10 0 10 1.000 10 0 10 44.1 11 0 11 1.000 11 0 11 45.2 8 15 23 0.652 8 14 22 46.1 24 0 24 0.000 23 0 23 47.2 10 5 15 0.333 10 5 15 48.2 40 2 42 0.048 38 1 39 49.2 0 17 17 1.000 0 11 11 200

CASE lO-l freq [ng-1 [0-1 *0- [0-1 freq [n-1 *ng- tokens CO-1

1.1 0.038 1 9 10 0.900 9 1.2 0.086 1 4 5 0.800 15 1.3 0.000 5 0 5 0.000 18 2.1 0.250 0 2 2 ft 0 2.2 0.105 1 3 4 0.750 2 2.3 0.015 1 3 4 0.750 16 3.3 0.000 2 0 2 0.000 17 4.3 0.756 3 5 8 0.625 3 5.1 0.000 1 2 3 0.667 5 5.2 0.065 29 2 31 0.065 10 5.3 0.014 2 0 2 0.000 38 6.1 0.818 0 1 1 ft 0 6.2 0.600 tc * * ft 35 6.3 0.200 * * * ft 0 7.1 0.500 0 1 1 ft 0 7.2 0.250 1 5 6 0.833 24 7.3 0.087 0 1 1 ft 8 8.1 0.000 3 0 3 0.000 20 8.2 0.000 * * * ft 7 8.3 0.000 2 0 2 0.000 21 9.2 0.042 * * * ft 15 9.3 0.000 9 0 9 0.000 20 10.1 0.895 * * * ft 0 10.2 0.717 0 3 3 1.000 9 10.3 0.389 * * * 8.000 9 11.3 0.176 2 4 6 0.667 17 12.1 0.700 * * * ft 8 12.2 0.294 0 7 7 1.000 16 12.3 0.192 1 3 4 0.750 32 13.1 0.765 1 8 9 0.889 3 13.2 0.617 0 15 15 1.000 28 13.3 0.256 * ** ft 3 14.1 0.000 2 0 2 0.000 3 14.2 0.000 * * * ft 27 14.3 0.000 * * * ft 14 15.1 0.049 1 14 15 0.933 8 15.2 0.065 0 10 10 1.000 9 15.3 0.000 0 6 6 1.000 12 16.1 0.063 * * * ft 7 16.2 0.000 * * * ft 8 17.1 0.300 10 7 17 0.412 1 17.2 0.063 * * * 8.000 10 17.3 0.022 8 7 15 0.467 14 18.1 0.300 0 4 4 1.000 16 18.2 0.500 6 3 9 0.333 18 19.3 0.188 1 4 5 0.800 0 20.1 0.020 6 1 7 0.143 3 20.3 0.217 0 13 13 1.000 9 21.2 0.231 * * ft ft 3 22.1 * * * ft ft 2 22.2 0.061 * * ft ft 8 23.1 0.667 * * ft ft 1 201

CASE [0-1 freq [ng-1 [0-1 »0- [0-1 freq [n-1 •**3- tokens (*0-)

23.2 0.190 1 11 12 0.917 10 23.3 0.043 * * * * 21 24.2 0.061 * * * * 14 25.2 0.154 * * * * 27 26.2 0.000 ** * * 18 27.2 0.322 0 6 6 * 13 28.2 0.750 ** * * 0 29.2 0.690 *** * . 0 30.2 0.000 * ** * 12 31.2 0.543 * ** * 13 32.2 0.611 * * * * 10 33.1 0.267 * * ** 5 34.1 0.333 * *** 2 35.1 0.000 ** * * 1 36.1 0.000 0 5 5 * 12 36.2 0.000 2 8 10 0.800 7 36.3 0.000 0 1 1 * 7 37.2 0.000 0 1 1 * 11 38.2 1.000 0 2 2 1.000 1 39.2 0.385 0 1 1 1.000 2 40.1 0.277 2 3 5 0.600 0 41.2 0.298 1 3 4 0.750 6 42.2 0.000 1 0 1 0.000 4 43.1 0.000 0 0 0 * 12 44.1 1.000 * * 0 * 4 45.2 0.636 0 1 1 1.000 3 46.1 0.000 1 0 1 * 17 47.2 0.333 * * 0 * 8 48.2 0.026 2 1 3 * 14 49.2 1.000 0 6 6 * 0 202

CASE Il-l •n- (1-1 freq [k'-l (h-1 *k- [h-] freq tokens tokens

1.1 38 47 0.809 10 0 10 0.000 1.2 18 33 0.545 0 21 21 0.000 1.3 44 62 0.710 20 0 20 0.000 2.1 20 20 1.000 10 6 16 0.375 2.2 18 20 0.900 8 3 11 0.273 2.3 49 65 0.754 19 2 21 0.095 3.3 15 32 0.469 9 0 9 0.000 4.3 31 34 0.912 17 5 22 0.227 5.1 19 24 0.792 13 0 13 0.000 5.2 25 35 0.686 0 20 20 1.000 5.3 20 58 0.345 48 0 48 0.000 6.1 14 14 1.000 7 3 10 0.300 6.2 10 45 0.222 10 8 18 0.444 6.3 25 25 1.000 22 4 26 0.154 7.1 14 14 1.000 2 11 13 0.846 7.2 13 37 0.351 39 9 48 0.188 7.3 10 18 0.556 10 0 10 0.000 8.1 6 26 0.231 16 0 16 0.000 8.2 4 11 0.364 17 0 17 0.000 8.3 1 22 0.045 14 0 14 0.000 9.2 9 24 0.375 8 3 11 0.273 9.3 6 26 0.231 16 0 16 0.000 10.1 17 17 1.000 5 25 30 0.833 10.2 20 29 0.690 13 7 20 0.350 10.3 16 25 0.640 9 5 14 0.357 11.3 70 87 0.805 14 0 14 0.000 12.1 8 16 0.500 4 10 14 0.714 12.2 11 27 0.407 4 5 9 0.556 12.3 21 53 0.396 39 6 45 0.133 13.1 37 40 0.925 4 36 40 0.900 13.2 55 83 0.663 3 12 15 0.800 13.3 10 13 0.769 6 1 7 0.143 14.1 4 7 0.571 7 0 7 0.000 14.2 20 47 0.426 15 0 15 0.000 14.3 15 29 0.517 7 0 7 0.000 15.1 21 29 0.724 27 0 27 0.000 15.2 25 34 0.735 6 0 6 0.000 15.3 10 22 0.455 6 0 6 0.000 16.1 20 27 0.741 16 5 21 0.238 16.2 27 35 0.771 36 2 38 0.053 17.1 35 36 0.972 14 14 28 0.500 17.2 39 49 0.796 25 18 43 0.419 17.3 37 51 0.725 54 1 55 0.018 18.1 1 17 0.059 3 1 4 0.250 18.2 3 21 0.143 12 15 27 0.556 19.3 29 29 1.000 10 2 12 0.167 20.1 13 16 0.813 18 2 20 0.100 20.3 20 29 0.690 8 0 8 0.000 21.2 14 17 0.824 13 0 13 0.000 22.1 18 20 0.900 4 8 12 0.667 22.2 27 35 0.771 24 4 28 0.143 23.1 19 20 0.950 9 20 29 0.690 203

CASE Il-l *n- [1-1 freq [k»-l Ih-1 *k- Ih-l f n tokens tokens

23.2 14 24 0.583 15 4 19 0.211 23.3 11 32 0.344 19 1 20 0.050 24.2 21 35 0.600 22 0 22 0.000 25.2 38 65 0.585 3 13 16 0.813 26.2 0 18 0.000 10 0 10 0.000 27.2 51 64 0.797 13 0 0 0.000 28.2 11 11 1.000 3 9 12 0.750 29.2 10 10 1.000 2 9 11 0.818 30.2 0 12 0.000 5 0 5 0.000 31.2 43 56 0.768 2 28 30 0.933 32.2 16 26 0.615 5 5 10 0.500 33.1 18 23 0.783 0 17 17 1.000 34.1 8 10 0.800 0 13 13 1.000 35.1 10 11 0.909 * * ** 36.1 5 17 0.294 12 0 12 1.000 36.2 24 31 0.774 21 2 23 0.087 36.3 10 17 0.588 * * * * 37.2 8 19 0.421 12 0 12 0.000 38.2 12 13 0.923 1 13 14 0.929 39.2 14 16 0.875 8 6 14 0.429 40.1 34 34 1.000 13 18 31 0.581 41.2 21 27 0.778 24 1 25 0.040 42.2 27 31 0.871 6 4 10 0.400 43.1 0 12 0.000 0 0 0 * 44.1 8 12 0.667 * * 0 * 45.2 17 20 0.850 ** 0 * 46.1 0 17 0.000 10 0 10 0.000 47.2 40 48 0.833 8 6 14 0.429 48.2 47 61 0.770 32 0 32 0.000 49.2 24 24 1.000 15 10 25 0.800 APPENDIX D

Group Statistical Results: Main Data Base

204 205

Results by Social Background: Descriptive D a t a Keys

* = missing data or lack of data N = number of cases used in test N* =number of missing cases (i.e., data not available)

Speech Register

reg N N* MEAN MEDIAN TRMEAN [0-] freq (*ng) 1 22 1 0.3121 0.2823 0.2985 2 38 1 0.2885 0.1722 0.2636 3 19 0 0.1231 0.0222 0.0931 * 1 0 O.OOE+00 O.OOE+00 O.OOE+00 [0-] freq (*0 ) 1 10 13 0.594 0.778 0.618 2 9 30 0.772 0.917 0.772 3 11 8 0.392 0.467 0.368 * 0 1 * ** [1-] freq 1 23 0 0.7648 0.8085 0.7872 2 38 1 0.6087 0.6764 0.6215 3 19 0 0.5927 0.5882 0.6009 * 1 0 0.8710 0.8710 0.8710 [h-1 freq 1 21 2 0.3847 0.2500 0.3725 2 36 3 0.3783 0.3625 0.3631 3 18 1 0.0747 0.0091 0.0617 * 1 0 0.4000 0.4000 0.4000

reg STDEV SEMEAN MIN MAX Q1 Q3 [0-1 freq(*ng) 1 0.3092 0.0659 0.0000 0.8947 0.0150 0.6437 2 0.3183 0.0516 0.0000 1.0000 0.0195 0.5571 3 0.1905 0.0437 0.0000 0.7556 0.0000 0.1923 * * * O.OOE+00 O.OOE+00 * * [0-] freq(*0) 1 0.418 0.132 0.000 1.000 0.107 0.950 2 0.340 0.113 0.065 1.000 0.567 1.000 3 0.396 0.119 0.000 1.000 0.000 0.750 * ** ** * * [1-] freq 1 0.2547 0.0531 0.0588 1.0000 0.7241 0.9500 21 0.3070 0.0498 0.0000 1.0000 0.3990 0.8364 31 0.2585 0.0593 0.0455 1.0000 0.3962 0.7692 * * * 0.8710 0.8710 * * [h-] freq 1 0.3882 0.0847 0.0000 1.0000 0.0000 0.7737 2: 0.3470 0.0578 0.0000 1.0000 0.0000 0.7077 31 0.1030 0.0243 0.0000 0.3571 0.0000 0.1456 * * * 0.4000 0.4000 * * 206

Sex

sex N N» MEAN MEDIAN TRMEAN [0-1 freq(*ng) 0 48 1 0.1661 0.0452 0.1358 1 32 1 0.3810 0.2855 0.3678 [0-1 freq(*0) 0 23 26 0.4769 0.4667 0.4747 1 7 26 0.8901 0.9167 0.8901 [1-1 freq 0 49 0 0.6115 0.7097 0.6214 1 32 1 0.7153 0.7747 0.7460 [h-1 freq 0 45 4 0.2525 0.0000 0.2284 1 31 2 0.3896 0.3500 0.3796

sex STDEV SEMEAN MIN MAX Q1 03 [0-1 freq(*ng) 0 0.2515 0.0363 0.0000 1.0000 0.0000 0.2625 1 0.3141 0.0555 0.0000 1.0000 0.0853 0.6974 [0-1 freq(*0) 0 0.4133 0.0862 0.0000 1.0000 0.0000 0.9000 1 0.1386 0.0524 0.6250 1.0000 0.8000 1.0000 [1-1 freq 0 0.2792 0.0399 0.0000 1.0000 0.4003 0.8023 1 0.2922 0.0516 0.0000 1.0000 0.5837 0.9875 [h-1 freq 0 0.3502 0.0522 0.0000 1.0000 0.0000 0.4645 1 0.3139 0.0564 0.0000 0.9290 0.1333 0.7140

Age Group

age-cat N N* MEAN MEDIAN TRMEAN [0-] freq(*ng) 1 12 0 0.6624 0.6628 0.6616 2 21 1 0.3672 0.2940 0.3617 3 36 1 0.0950 0.0401 0.0705 4 11 0 0.0988 0.0000 0.0096 [0-] freq(*0 ) 1 1 11 1.0000 1.0000 1.0000 2 10 12 0.7457 0.8444 0.7892 3 17 20 0.5140 0.667 0.516 4 2 9 0.0000 0.0000 0.0000 [1-] freq 1 12 0 0.8537 0.8790 0.8629 2 22 0 0.7009 0.7803 0.7180 3 36 1 0.6692 0.7248 0.6740 4 11 0 0.2816 0.2308 0.2587 [h-J freq 1 11 1 0.6481 0.7500 0.6495 2 22 0 0.4139 0.2750 0.4053 3 34 3 0.2120 0.0341 0.1736 4 9 2 0.0000 0.0000 0.0000 207

It SnæV SEMEAN MIN MAX Q1 Q3 tO-1 £req(*ng) 1 0.2279 0.0658 0.3330 1.0000 0.4274 0.8586 2 0.2455 0.0536 0.0200 0.8180 0.1899 0.6083 3 0.1513 0.0252 0.0000 0.6667 0.0000 0.1005 4 0.2995 0.0903 0.0000 1.0000 0.0000 0.0260 to-] freq(*0 ) 1 ** 1.0000 1.0000 * * 2 0.3049 0.0964 0.1429 1.0000 0.5521 1.0000 3 0.415 0.1010 0.0000 1.0000 0.000 0.908 4 0.0000 0.0000 0.0000 0.0000 ** [1-] freq 1 0.1441 0.0416 0.6154 1.0000 0.7092 1.0000 2 0.2841 0.0606 0.0588 1.0000 0.5402 0.9151 3 0.2219 0.0370 0.2308 1.0000 0.4809 0.8198 4 0.2984 0.0900 0.0000 0.7700 0.0000 0.6000 [h-] freq 1 0.2342 0.0706 0.3500 0.9333 0.4290 0.8333 2 0.3406 0.0726 0.0000 1.0000 0.1405 0.7355 3 0.3028 0.0519 0.0000 1.0000 0.0000 0.3812 4 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 208

Bducaticn &oiç

ed-cat NN* MEAN MEDIAN TRMEAN [0-] freq(*ng-) 1 16 0 0.1458 0.0241 0.1174 2 41 1 0.2908 0.1923 0.2682 3 15 1 0.2327 0.2500 0.2097 * 8 0 0.303 0.030 0.303 lO-l freq (*0- ) 1 6 10 0.299 0.206 0.299 2 17 25 0.7030 0.8000 0.7301 3 7 9 0.493 0.667 0.493 * 0 8 * * * [1-] freq 1 16 0 0.5999 0.5774 0.6110 2 41 1 0.6989 0.7679 0.7149 3 16 0 0.7324 0.7770 0.7410 * 8 0 0.360 0.300 0.360 [h-] freq 1 16 0 0.1959 0.0091 0.1654 2 39 3 0.3024 0.2381 0.2817 3 15 1 0.484 0.581 0.482 * 6 2 0.208 0.000 0.208

ed-cat STDEV SEMEAN MIN MAX Q1 Q3 [0-] freq(*ng) 1 0.2344 0.0586 0.0000 0.6897 0.0000 0.2726 2 0.3088 0.0482 0.0000 1.0000 0.0292 0.5214 3 0.2398 0.0619 0.0000 0.7647 0.0145 0.3333 * 0.415 0.147 0.000 1.000 0.000 0.715

[0-1 freq(*0 ) 1 0.372 0.152 0.000 0.917 0.000 0.579 2 0.3613 0.0876 0.000 1.000 0.479 1.000 3 0.453 0.171 0.000 1.000 0.000 0.889 * * * * * * * [1-1 freq 1 0.2852 0.0713 0.0455 1.0000 0.3780 0.8612 2 0.2578 0.0403 0.0588 1.0000 0.5648 0.8875 3 0.2092 0.0523 0.3448 1.0000 0.5827 0.9068 * 0.404 0.143 0.000 1.000 0.000 0.654 [h-1 freq 1 0.2814 0.0703 0.0000 0.8182 0.0000 0.4264 2 0.3131 0.0501 0.0000 1.0000 0.0000 0.4444 3 0.423 0.109 0.000 1.000 0.000 0.900 * 0.332 0.136 0.000 0.750 0.000 0.563 209 Birthplace

birthpl NN* MEANMEDIAN TRMEAN [0-] freq(*ng) 1 72 2 0.2670 0.1820 0.2396 2 1 0 O.OOE+00 O.OOE+00 O.OOE+00 4 4 0 0.225 0.117 0.225 5 2 0 0.0208 0.0208 0.0208 * 1 0 O.OOE+00 O.OOE+00 O.OOE+00 [0-] freq(*0 ) 1 28 46 0.5815 0.7083 0.5878 2 0 1 * * * 4 1 3 0.91667 0.91667 0.91667 5 1 1 O.OOE+00 O.OOE+00 O.OOE+00 * 0 1 * * * [1-] freq 1 73 1 0.6842 0.7679 0.7053 2 1 0 0.42100 0.42100 0.42100 4 4 0 0.469 0.464 0.469 5 2 0 0.3029 0.3029 0.3029 * 1 0 O.OOE+00 O.OOE+00 O.OOE+00 [h-] freq 1 69 5 0.3220 0.1875 0.3051 2 1 0 O.OOE+00 O.OOE+00 O.OOE+00 4 4 0 0.238 0.130 0.238 5 2 0 0.136 0.136 0.136 * 0 1 ** *

birthpl STDEV SEMEAN MIN MAX 01 03 to-] freq(*ng) 1 0.2999 0.0353 0.0000 1.0000 0.0036 0.4722 2 * * O.OOE+00 O.OOE+00 * * 4 0.306 0.153 0.000 0.667 0.011 0.548 5 0.0295 0.0208 0.0000 0.0417 * * * * * O.OOE+00 O.OOE+00 * * to-] freq(*0-) 1 0.4010 0.0758 0.0000 1.0000 0.0841 0.9833 2 * ** * * * 4 ** 0.91667 0.91667 * * 5 * * O.OOE+00 O.OOE+00 * * ** * * * * * [1-] freq 1 0.2681 0.0314 0.0000 1.0000 0.5505 0.8915 2 * * 0.42100 0.42100 ** 4 0.400 0.200 0.000 0.950 0.086 0.858 5 0.1020 0.0721 0.2308 0.3750 * * * ** O.OOE+00 O.OOE+00 * * [h-J freq 1 0.3467 0.0417 0.0000 1.0000 0.0000 0.5683 2 ** O.OOE+00 O.OOE+00 * * 4 0.315 0.157 0.000 0.690 0.013 0.570 5 0.193 0.136 0.000 0.273 * * ** * ** * * 210 Father's Birttplaœ (Place of Origin)

fathbpl NN* MEAN MEDIAN TRMEAN freq(*ng) 1 42 1 0.2852 0.1952 0.2626 • 2 11 1 0.1247 0.0385 0.0746 3 6 0 0.2634 0.2587 0.2634 4 10 0 0.367 0.274 0.334 5 5 0 0.00833 0.0000 0.00833 * 6 0 0.254 0.040 0.254 freq 1 13 30 0.6859 0.8333 0.7198 2 5 7 0.380 0.000 0.380 3 4 2 0.619 0.667 0.619 4 2 8 0.875 0.875 0.875 5 2 3 0.400 0.400 0.400 * 4 2 0.339 0.345 0.339 freq 1 42 1 0.7198 0.7696 0.7371 2 12 0 0.5499 0.4830 0.5553 3 6 0 0.412 0.416 0.412 4 10 0 0.7556 0.8245 0.8194 5 5 0 0.452 0.375 0.452 * 6 0 0.622 0.739 0.622 freq 1 39 4 0.3185 0.1875 0.2977 2 12 0 0.1844 0.0000 0.1498 3 6 0 0.306 0.175 0.306 4 10 0 0.3653 0.3536 0.3405 5 4 1 0.318 0.136 0.318 * 5 1 0.409 0.227 0.409

fathbp STDEV SEMEANMIN MAX Q1 Q3 freq(*ng) 1 0.2961 0.0457 0.0000 1.0000 0.0384 0.5250 2 0.2133 0.0643 0.0000 0.7000 0.0000 0.1923 3 0.2307 0.0942 0.0000 0.5429 0.0150 0.5107 4 0.377 0.119 0.000 1.000 0.012 0.762 5 0.01863 0.00833 0.0000 0.04167 0.0000 0.02083 * 0.364 0.149 0.000 0.756 0.000 0.706 freq(*0 ) 1 0.3595 0.0997 0.0000 1.0000 0.4392 0.9667 2 0.522 0.233 0.0000 1.000 0.000 0.950 3 0.447 0.223 0.143 1.000 0.190 1.000 4 0.177 0.125 0.750 1.000 ** 5 0.566 0.400 0.000 0.800 * * * 0.356 0.178 0.000 0.667 0.016 0.656 [1-] freq 1 0.2356 0.0363 0.0000 1.0000 0.5804 0.9131 2 0.2889 O.0834 0.0455 1.0000 0.3718 0.7992 3 0.382 0.156 0.000 0.812 0.044 0.779 4 0.2927 0.0925 0.0000 1.0000 0.6772 0.9423 5 0.225 0.101 0.231 0.774 0.262 0.681 * 0.380 0.155 0.000 1.000 0.259 0.934 [h-] freq 1 0.3415 0.0547 0.0000 1.0000 0.0000 0.5810 2 0.2850 0.0823 0.0000 0.7140 0.0000 0.4524 3 0.371 0.152 0.000 0.933 0.000 0.650 4 0.3090 0.0977 0.0000 0.9290 0.0814 0.5083 5 0.472 0.236 0.000 1.000 0.000 0.818 * 0.470 0.210 0.000 1.000 0.000 0.909 211 nuuiex-» ojxuqjxcBjK

Mothbp N N*MEAN MEDIAN TRMEAN £req(*ng) 1 37 1 0.2820 0.1875 0.2556 2 14 1 0.2149 0.0733 0.1870 3 6 0 0.2634 0.2587 0.2634 4 9 0 0.355 0.298 0.355 5 8 0 0.0533 0.0111 0.0533 * 6 0 0.254 0.040 0.254 freq(*0 ) 1 10 28 0.698 0.817 0.748 X 2 7 8 0.541 0.889 0.541 3 4 2 0.619 0.667 0.619 4 1 8 0.91667 0.91667 0.91667 5 4 4 0.420 0.439 0.420 * 4 2 0.339 0.345 0.339 freq 1 37 1 0.7296 0.7700 0.7507 2 15 0 0.5970 0.6627 0.6085 3 6 0 0.412 0.416 0.412 4 9 0 0.660 0.778 0.660 5 a 0 0.5945 0.6569 0.5945 * 6 0 0.622 0.739 0.622 freq 1 34 4 0.2955 0.1771 0.2682 2 15 0 0.2704 0.1333 0.2427 3 6 0 0.306 0.175 0.306 4 9 0 0.361 0.400 0.361 5 7 1 0.316 0.273 0.316 * 5 1 0.409 0.227 0.409

Mothbp STDEV SEMEAN MINMAX Q1 Q3 freq(*ng) 1 0.3110 0.0511 0.0000 1.0000 0.0207 0.4444 2 0.2787 0.0745 0.0000 0.7647 0.0000 0.3747 3 0.2307 0.0942 0.0000 0.5429 0.0150 0.5107 4 0.346 0.115 0.000 1.000 0.021 0.639 5 0.1024 0.0362 0.0000 0.3000 0.0000 0.0573 * 0.364 0.149 0.000 0.756 0.000 0.706 freq(*0 ) 1 0.385 0.122 0.000 1.000 0.500 1.000 2 0.508 0.192 0.000 1.000 0.000 1.000 3 0.447 0.223 0.143 1.000 0.190 1.000 4 ft ft 0.91667 0.91667 ft ft 5 0.328 0.164 0.000 0.800 0.103 0.717 ft 0.356 0.178 0.000 0.667 0.016 0.656 freq 1 0.2418 0.0397 0.0000 1.0000 0.5780 0.9545 2 0.2786 0.0719 0.0455 1.0000 0.3962 0.8085 3 0.382 0.156 0.000 0.812 0.044 0.779 4 0.317 0.106 0.000 0.950 0.464 0.899 5 0.2682 0.0948 0.2308 0.9722 0.3143 0.7904 ft 0.380 0.155 0.000 1.000 0.259 0.934 [h-] freq 1 0.3364 0.0577 0.0000 1.0000 0.0000 0.4786 2 0.3459 0.0893 0.0000 0.9000 0.0000 0.6667 3 0.371 0.152 0.000 0.933 0.000 0.650 4 0.318 0.106 0.000 0.929 0.045 0.595 5 0.366 0.138 0.000 1.000 0.000 0.500 ft 0.470 0.210 0.000 1.000 0.000 0.909 212 School Language

schoollg N N*MEAN MEDIAN TRMEAN to-] freq(*ng) 1 48 1 0.1895 0.0556 0.1613 2 23 0 0.4269 0.3850 0.4200 * 9 1 0.1391 0.0606 0.1391 to-] freq(*0 ) 1 23 26 0.5472 0.6667 0.5516 2 6 17 0.616 0.778 0.616 * 1 9 0.91667 0.91667 0.91667 [1-] freq 1 48 1 0.6093 0.6997 0.6193 2 23 0 0.7706 0.8710 0.7858 * 10 0 0.588 0.686 0.616 [h-j freq 1 44 5 0.2434 0.0463 0.2178 2 23 0 0.4771 0.4444 0.4781 * 9 1 0.1955 0.0500 0-1955

schoollg STDEV SEMEAN MINMAX Q1 Q3 [0-1 freq(*ng) 1 0.2767 0.0399 0.0000 1.0000 0.0000 0.2902 2 0.2974 0.0620 0.0000 1.0000 0.1923 0.7000 * 0.2148 0.0716 0.0000 0.6667 0.0000 0.2106 to-] freq(*0 ) 1 0.4084 0.0852 0.0000 1.0000 0.0000 0.9333 2 0.442 0.180 0.000 1.000 0.107 1.000 ** * 0.91667 0.91667 * * [1-] freq 1 0.2746 0.0396 0.0000 1.0000 0.4221 0.7894 2 0.2548 0.0531 0.2222 1.0000 0.6154 1.0000 * 0.361 0.114 0.000 0.950 0.258 0.902 [h-1 freq 1 0.3403 0.0513 0.0000 1.0000 0.0000 0.3688 2 0.3104 0.0647 0.0000 0.9333 0.1538 0.8000 * 0.2836 0.0945 0.0000 0.6897 0.0000 0.4386 213 other Dialects

liai. N N* MEAN MEDIAN TRMEAN [0-] freq(*ng) 0 46 1 0.3232 0.2500 0.3063 1 8 1 0.0471 0.0172 0.0471 2 5 0 0.343 0.277 0.343 3 3 0 0.188 0.176 0.188 4 4 0 0.233 0.158 0.233 5 1 0 O.OOE+00 O.OOE+00 O.OOE+00 6 5 0 0.188 0.043 0.188 7 3 0 0.0000 0.0000 0.00000 * 5 0 0.154 0.000 0.154 [0-] freq(*0) 0 14 33 0.7548 0.8944 0.7973 1 6 3 0.312 0.104 0.312 2 2 3 0.0000 0.0000 0.0000 3 1 2 0.66667 0.66667 0.66667 4 2 2 0.500 0.500 0.500 5 0 1 * * * 6 2 3 0.458 0.458 0.458 7 1 2 0.8000 0.8000 0.8000 * 2 3 0.6875 0.6875 0.6875 [1-] freq 0 46 1 0.7130 0.7747 0.7319 1 9 0 0.6211 0.6897 0.6211 2 5 0 0.693 0.768 0.693 3 3 0 0.622 0.640 0.622 4 4 0 0.6508 0.6313 0.6508 5 1 0 O.OOE+00 O.OOE+00 O.OOE+00 6 5 0 0.460 0.375 0.460 7 3 0 0.552 0.588 0.552 * 5 0 0.515 0.754 0.515 [h-1 freq 0 44 3 0.3486 0.2865 0.3335 1 9 0 0.212 0.000 0.212 2 5 0 0.470 0.581 0.470 3 3 0 0.119 0.000 0.119 4 4 0 0.236 0.071 0.236 5 1 0 O.OOE+00 O.OOE+00 O.OOE+00 6 5 0 0.245 0.211 0.245 7 2 1 0.500 0.500 0.500 * 3 2 0.1075 0.0952 0.1075 214

ll. STDEV SEMEAN MIN MAX Q1 03 [0-] freq(*ng) 0 0.3086 0.0455 0.0000 1.0000 0.0591 0.6028 1 0.0736 0.0260 0.0000 0.2174 0.0000 0.0635 2 0.382 0.171 0.000 0.895 0.000 0.719 3 0.195 0.112 0.000 0.389 0.000 0.389 4 0.278 0.139 0.000 0.617 0.015 0.526 5 ** O.OOE+00 O.OOE+00 * * 6 0.277 0.124 0.000 0.667 0.021 0.429 7 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 * 0.336 0.150 0.000 0.756 0.000 0.385 [0-] freq(*0) 0 0.3196 0.0854 0.0000 1.0000 0.4529 1.0000 1 0.420 0.172 0.000 1.000 0.000 0.750 2 0.0000 0.0000 0.0000 0.00000 * * 3 ** 0.66667 0.66667 * * 4 0.707 0.500 0.000 1.000 * * 5 * * * ** * 6 0.648 0.458 0.000 0.917 * * 7 ** 0.80000 0.80000 * * * 0.0884 0.0625 0.6250 0.7500 * * I1-] freq 0 0.2659 0.0392 0.0000 1.0000 0.5533 0.9235 1 0.2424 0.0808 0.2308 0.9000 0.3542 0.8021 2 0.338 0.151 0.231 1.000 0.350 1.000 3 0.192 0.111 0.421 0.805 0.421 0.805 4 0.0876 0.0438 0.5714 0.7692 0.5786 0.7426 5 * * O.OOE+00 O.OOE+00 * * 6 0.335 0.150 0.045 0.950 0.195 0.767 7 0.242 0.140 0.294 0.774 0.294 0.774 * 0.474 0.212 0.000 0.912 0.000 0.910 [h-] freq 0 0.3326 0.0501 0.0000 1.0000 0.0000 0.5556 1 0.366 0.122 0.000 1.000 0.000 0.405 2 0.447 0.200 0.000 0.933 0.000 0.883 3 0.206 0.119 0.000 0.357 0.000 0.357 4 0.382 0.191 0.000 0.800 0.000 0.636 5 * * O.OOE+00 O.OOE+00 * * 6 0.273 0.122 0.000 0.690 0.025 0.481 7 0.707 0.500 0.000 1.000 * * * 0.1141 0.0659 0.0000 0.2273 0.0000 0.2273 215 Interviewer

int N M* MEAN MEDIAN TRMEAN [0-1 freq(*ng) 0 27 0 0.3072 0.2308 0.2918 1 19 1 0.2886 0.0645 0.2637 2 34 1 0.1879 0.0457 0.1576 [0-] freq(*0 ) 0 8 19 0.8604 0.9583 0.8604 1 3 17 0.577 0.667 0.577 2 19 16 0.4519 0.4667 0.4462 [1-1 freq G 27 0 0.6558 0.7407 0.6659 1 19 1 0.6061 0.6857 0.6186 2 35 0 0.6752 0.7255 0.6888 [h-1 freq 0 27 0 0.3534 0.2730 0.3417 1 16 4 0.3763 0.4145 0.3586 2 33 2 0.2388 0.0500 0.2028

int STDEV SEMEANMINMAX Q1 03 [0-1 freq(*ng) 0 0.3113 0.0599 0.0000 1.0000 0.0625 0.5429 1 0.3340 0.0766 0.0000 1.0000 0.0000 0.6360 2 0.2549 0.0437 0.0000 0.8947 0.0000 0.2750 [0-1 freq(*0 ) 0 0.2278 0.0805 0.3333 1.0000 0.8083 1.0000 1 0.474 0.274 0.065 1.000 0.065 1.000 2 0.4108 0.0942 0.0000 1.0000 0.0000 0.8889 II-] freq 0 0.2618 0.0504 0.0588 1.0000 0.4255 0.8235 1 0.3698 0.0848 0.0000 1.0000 0.3750 0.8830 2 0.2598 0.0439 0.0455 1.0000 0.4688 0.9091 [h-1 freq 0 0.3293 0.0634 0.0000 1.0000 0.0400 0.5556 1 0.3455 0.0864 0.0000 1.0000 0.0000 0.6808 2 0.3446 0.0600 0.0000 1.0000 0.0000 0.4286 216

Esther's Bducatian Level

fathed-c N N*MEANMEDIANTRMEAN [0-1 freq(*ng) 1 50 0 0.2494 0.1294 0.2217 2 15 0 0.2155 0.1875 0.1998 * 15 o 0.2976 0.0606 0.2665 [0-1 freq(*0) 1 21 29 0.6520 0.8000 0.6680 2 7 8 0.338 0.065 0.338 * 2 15 0.571 0.571 0.571 [1-1 freq 1 50 0 0.6701 0.7304 0.6836 2 15 0 0.6211 0.6857 0.6362 * 16 1 0.6270 0.7857 0.6452 [h-1 freq 1 49 1 0.2981 0.2273 0.2816 2 14 1 0.295 0.083 0.261 * 13 4 0.362 0.143 0.337

fathed-c STDEV SEMEAN MIN MAX 01 Q3 [0-1 freq(*ng) 1 0.2917 0.0412 0.0000 1.0000 0.0000 0.4167 2 0.2257 0.0583 0.0000 0.6360 0.0000 0.3850 * 0.3767 0.0973 0.0000 1.0000 0.0000 0.6897 [0-1 freq(*0 ) 1 0.3821 0.0834 0.0000 1.0000 0.3725 1.0000 2 0.405 0.153 0.000 0.833 0.000 0.800 * 0.606 0.429 0.143 1.000 ft ft [1-1 freq 1 0.2494 0.0353 0.0588 1.0000 0.4652 0.8783 2 0.2937 0.0758 0.0455 1.0000 0.3514 0.8500 ft 0.3917 0.0979 0.0000 1.0000 0.1500 0.9068 [h-1 freq 1 0.3172 0.0453 0.0000 1.0000 0.0000 0.5278 2 0.392 0.105 0.000 1.000 0.000 0.586 ft 0.390 0.108 0.000 1.000 0.000 0.775

Mother's BducaticHi Level

nothed-c N N* MEAN MEDIANTRMEAN [0-1 freq(*ng) 1 61 0 0.2407 0.1538 0.2176 2 4 0 0.255 0.181 0.255 ft 15 2 0.2976 0.0606 0.2665 [0-1 freq(*0 ) 1 26 35 0.5838 0.7750 0.5908 2 2 2 0.4392 0.4392 0.4392 ft 2 15 0.571 0.571 0.571 [1-1 freq 1 61 0 0.6472 0.7097 0.6587 2 4 0 0.8359 0.8230 0.8359 ft 16 1 0.6270 0.7857 0.6452 [h-1 freq 1 eu 1 0.2967 0.1771 0.2741 2 3 1 0.312 0.419 0.312 ft 13 4 0.362 0.143 0.337 217

inothed-c STDEV SEMEAN MINMAX Q1 03 [0-1 freq(*ng) 1 0.2785 0.0357 0.0000 1.0000 0.0000 0.3869 2 0.282 0.141 0.022 0.636 0.032 0.552 * 0.3767 0.0973 0.0000 1.0000 0.0000 0.6897 [0-1 freq(*ü) 1 0.4187 0.0821 0.0000 1.0000 0.0000 0.9500 2 0.0388 0.0275 0.4118 0.4667 * * * 0.606 0.429 0.143 1.000 * * tl-1 freq 1 0.2619 0.0335 0.0455 1.0000 0.4233 0.8473 2 0.1042 0.0521 0.7255 0.9722 0.7431 0.9417 * 0.3917 0.0979 0.0000 1.0000 0.1500 0.9068 [h-] freq 1 0.3368 0.0435 0.0000 1.0000 0.0000 0.5417 2 0.258 0.149 0.018 0.500 0.018 0.500 * 0.390 0.108 0.000 1.000 0.000 0.775

Social Gkomqp

soc-grp N N* MEANMEDIAN TRMEAN [0-] freq(*ng) 0 61 1 0.2472 0.1050 0.2215 1 13 1 0.3612 0.2980 0.3360 2 6 0 0.0657 0.0000 0.0657 [0-] freq(*0) 0 29 33 0.5655 0.6667 0.5704 1 0 14 * * * 2 1 5 0.8000 0.8000 0.8000 [1-] freq 0 62 0 0.6608 0.7248 0.6762 1 13 1 0.622 0.770 0.644 2 6 0 0.6330 0.6870 0.6330 [h-] freq 0 61 1 0.2731 0.1667 0.2515 1 10 4 0.478 0.590 0.473 2 5 1 0.400 0.000 0.400

soc-grp STDEV SEMEANMIN MAX 01 Q3 [0-] freq(*ng) 0 0.2924 0.0374 0.0000 1.0000 0.0072 0.3869 1 0.3337 0.0925 0.0000 1.0000 0.0130 0.6628 2 0.1334 0.0544 0.0000 0.3333 0.0000 0.1288 [0-] freq(*0) 0 0.4113 0.0764 0.0000 1.0000 0.0323 0.9667 1 * * ** * * 2 * * 0.8000 0.8000 * * [1-] freq 0 0.2755 0.0350 0.0000 1.0000 0.4473 0.9000 1 0.372 0.103 0.000 1.000 0.292 0.867 2 0.2387 0.0974 0.2941 0.9091 0.3893 0.8273 [h-1 freq 0 0.3008 0.0385 0.0000 1.0000 0.0000 0.4722 1 0.430 0.136 0.000 1.000 0.000 0.847 2 0.548 0.245 0.300 1.000 0.000 1.000 218

Results by Social Background: Atialyis of Variance Tests for Main Eactocs

Analysis of %rianoe: [0-] Frequency (*ng )

Factor Levels Values reg 3 1 2 3 sex 2 0 1 age-cat 4 1 2 3 4 ed-cat 3 1 2 3 origin 5 1 2 3 4

Analysis of Variance for [*ng-] Frequency (*0~) Source DF Seq SS Adj SS Adj MS F P reg 2 0.28419 0.21156 0.10578 4.18 0.025 sex 1 0.74361 0.15184 0.15184 6.00 0.020 age-cat 3 0.72715 0.35537 0.11846 4.68 0.008 ed-cat 2 0.09077 0.09914 0.04957 1.96 0.158 origin 4 0.02132 0.02132 0.00533 0.21 0.931 Error 31 0.78417 0.78417 0.02530 Total 43 2.65121 lAiusual Observations for [*ng-] Frequency (*0-)

Obs. [0-] freq Fit Stdev. Fit Residual St.Resid 3 0.015385 0.015385 0.159047 0.000000 * X 38 0.500000 0.223508 0.084464 0.276492 2.05R 44 0.060606 0.332691 0.089362 -0.272085 -2.07R 45 0.666667 0.335875 0.077470 0.330791 2.38R R denotes an obs. with a large st. resid. X denotes an obs. v4iose X value gives it large influence. 219

Analysis of Variance: [0-1 Frequency (*□-)

Factor Levels Values reg 3 1 2 3 sex 2 0 1 age-cat 4 1 2 3 4 ed-cat 3 1 2 3 origin 5 1 2 3 4 5

Analysis of Variance for I0-] Frequency (*0-) Source DF Seq SS Adj SS Adj MS F P reg 2 0.7524 0.3367 0.1683 1.53 0.253 sex 1 0.4706 0.2086 0.2086 1.90 0.192 age-cat 3 0.5002 0.0632 0.0211 0.19 0.900 ed-cat 2 0.3433 0.6473 0.3236 2.94 0.088 origin 4 0.6444 0.6444 0.1611 1.46 0.269 Error 13 1.4298 1.4298 0.1100 Total 25 4.1406 Unusual Observations for [0-1 Frequency (*0-)

Obs. (0-]freq Pit Stdev.Fit Residual St.Resid 3 0.75000 0.75000 0.33164 0.00000 * X 16 0.00000 --0.00000 0.33164 0.00000 * X 18 1.00000 1.00000 0.33164 0.00000 * X X denotes an obs. v^ose X value gives it large influence.

Aialyis of V^iance for [1-] FcequoKy

Factor Levels Values reg 3 1 2 3 sex 2 0 1 age-cat 4 1 2 3 4 ed-cat 3 1 2 3 origin 5 1 2 3 4 5 Analysis of Variance for 11--1 Frequency

Source DF Seq SS Adj SS Adj MS FP reg 2 0.21312 0.40622 0.20311 4.43 0.020 sex 1 0.24864 0.00277 0.00277 0.06 0.807 age-cat 3 0.47149 0.47536 0.15845 3.45 0.028 ed-cat 2 0.05035 0.02746 0.01373 0.30 0.743 origin 4 0.54665 0.54665 0.13666 2.98 0.034 Error 32 1.46773 1.46773 0.04587 Total 44 2.99798 220

IMusual Observations for [1-] Frequency

CSjs. (1-Jfreq Fit Stdev. Fit Residual St.Resid 3 0.75385 0.75385 0.21416 0.00000 * X 9 0.22222 0.70082 0.09077 -0.47860 -2.47R 37 0.05882 0.50271 0.10561 -0.44388 -2.38R

R denotes an obs. with a large st. resid. X denotes an obs. v4iose X value gives it large influence.

Analyis of %riance for [h-] Frequency

Factor Levels Values reg 3 1 2 3 sex 2 0 1 age-cat 4 1 2 3 4 ed-cat 3 1 2 3 origin 5 1 2 3 4 5 Analysis of Variance for [h] Frequency

Source DF Seq SS Adj SS Adj MS F P reg 2 0.65717 0.51878 0.25939 5.33 0.010 sex 1 0.57011 0.09109 0.09109 1.87 0.181 age-cat 3 0.72630 0.50698 0.16899 3.47 0.027 ed-cat 2 0.21588 0.19222 0.09611 1.97 0.156 origin 4 0.06136 0.06136 0.01534 0.31 0.866 Error 32 1.55863 1.55863 0.04871 Total 44 3.78945

Uiusual Observations for [h-] Frequency Obs. [h-]freq Fit Stdev. Fit Residual St.Resid 3 0.09524 0.09524 0.22070 0.00000 * X 55 0.93333 0.57857 0.17173 0.35476 2.56R R denotes an obs. with a large st. resid. X denotes an obs. whose X value gives it large influence. APPENDIX E Individual Data and Results (Data Base 2; Speech Register)

221 222

CASE REG. SEXACE Birth Father Mother Edu. Place Birthpl. Birth^. Level

1.1 1 m 37 Hong Kong Kaiping Kaiping 9 1.2 2 m 37 Hong Kong Kaiping Kaiping 9 1.3 3 m 37 Hong Kong Kaiping Kaiping 9 2.1 1 m 32 Hong Kong Qingyuan Shunde 7 2.2 2 m 32 Hong Kong Qingyuan Shunde 7 2.3 3 m 32 Hong Kong Qingyuan Shunde 7 5.1 1 m 34 Guangdong Guangdong Guangdong 16 5.2 2 m 34 Guan^ong Guanÿong Guangdong 16 5.3 3 m 34 Guan^ong Guangdong Guan^ong 16 6.1 1 f 26 Hong Kong Guangzhou Bao’an 12 6.2 2 f 26 Hong Kong Guangzhou Bao'an 12 6.3 3 f 26 Hong Kong Guangzhou Bao'an 12 7.1 1 m 33 Hong Kong Hong Kong Hong Kong 14 7.2 2 m 33 Hong Kong Hong Kong Hong Kong 14 7.3 3 m 33 Hong Kong Hong Kong Hong Kong 14 8.1 1 m 57 Guangzhou Xinhui Xinhui 4 8.2 2 m 57 Guangzhou Xinhui Xinhui 4 8.3 3 m 57 Guangzhou Xinhui Xinhui 4 10.1 1 f 18 Hong Kong Yingde Hong Kong 11 10.2 2 f 18 Hong Kong Yingde Hong Kong 11 10.3 3 f 18 Hong Kong Y i n ^ e Hong Kong 11 12.1 1 f 26 Hong Kong Taishan Taishan 12 12.2 2 f 26 Hong Kong Taishan Taishan 12 12.3 3 f 26 Hong Kong Taishan Taishan 12 13.1 1 f 23 Hong Kong Panyu Xinhui 15 13.2 2 f 23 Hong Kong Panyu Xinhui 15 13.3 3 f 23 Hong Kong Panyu Xinhui 15 14.1 1 m 44 Hong Kong Donggucin Dongguan 6 14.2 2 m 44 Hong Kong Dongguan Dongguan 6 14.3 3 m 44 Hong Kong Dongguan Dongguan 6 15.1 1 m 37 Hong Kong Nanhai Dongguan 10 15.2 2 m 37 Hong Kong Nanhai Dongguan 10 15.3 3 m 37 Hong Kong Nanhai Dongguan 10 17.1 1 m 36 Hong Kong Sansui Hong Kong 6 17.2 2 m 36 Hong Kong Sansui Hong Kong 6 17.3 3 m 36 Hong Kong Sansui Hong Kong 6 23.1 1 f 35 Zhongshan Shunde Zhongsha 0 23.2 2 f 35 Zhongshan Shunde Zhongshan 0 23.3 3 f 35 Zhongshan Shunde Zhongshan 0 36.1 1 m 36 Hong Kong Chaozhou Chaozhou 7 36.2 2 m 36 Hong Kong Chaozhou Chaozhou 7 36.3 3 m 36 Hong Kong Chaozhou Chaozhou 7 223

CASE Father's Mother's School Home other Int= -Social Edu. Edu. Lang. Lang. Dialects Groqp

1.1 6 3 c c none 2 0 1.2 6 3 c c none 0 0 1.3 6 3 c c none 2 0 2.1 6 0 c c none 0 0 2.2 6 0 c c none 0 0 2.3 6 0 c c none 2 0 5.1 9 6 c c Mandarin 1 0 5.2 9 6 c c Mandarin 1 0 5.3 9 6 c c Mandcirin 2 0 6.1 6 0 c,e c none 0 0 6.2 6 0 c,e c none 0 0 6.3 6 0 c,e c none 2 0 7.1 9 3 c c none 2 0 7.2 9 3 c c none 0 0 7.3 9 3 c c none 2 0 8.1 10 0 c c Mandarin 2 0 8.2 10 0 c c Mandarin 0 0 8.3 10 0 c c Mandarin 2 0 10.1 6 0 c,e c Kejia 2 0 10.2 6 0 c,e c Kejia 0 0 10.3 6 0 c,e c Kejia 2 0 12.1 2 0 c,e c Taishan 1 0 12.2 2 0 c,e c Taishan 1 0 12.3 2 0 c,e c Taishan 2 0 13.1 3 0 c,e c none 2 0 13.2 3 0 c,e c none 0 0 13.3 3 0 c,e c none 2 0 14.1 6 6 c c Dongguan 2 0 14.2 6 6 c c Dongguan 0 0 14.3 6 6 c c Dongguan 2 0 15.1 0 0 c c none 2 0 15.2 0 0 c c none 0 0 15.3 0 0 c c none 2 0 17.1 6 9 c c none 2 0 17.2 6 9 c c none 0 0 17.3 6 9 c c none 2 0 23.1 6 0 N/A c none 2 0 23.2 6 0 N/A c none 0 0 23.3 6 0 N/A c none 2 0 36.1 0 0 c c Chaozhou 2 0 36.2 0 0 c c Chaozhou 0 0 36.3 0 0 c c Chaozhou 2 0 224

CASE (ng-1 [0-1 ng- [ng-1 freq [ng-1 [0-1 •ng- tokens tokens

1.1 30 10 40 0.250 25 1 26 1.2 33 7 40 0.175 32 3 35 1.3 36 0 36 0.000 36 0 36 2.1 6 4 10 0.400 6 2 8 2.2 18 5 23 0.217 17 2 19 2.3 66 4 70 0.057 64 1 65 5.1 23 2 25 0.080 21 0 21 5.2 30 2 32 0.063 29 2 31 5.3 68 1 69 0.014 68 1 69 6.1 2 10 12 0.833 2 9 11 6.2 58 27 85 0.318 18 27 45 6.3 28 7 35 0.200 28 7 35 7.1 5 6 11 0.545 5 5 10 7.2 19 11 30 0.367 18 6 24 7.3 21 3 24 0.125 21 2 23 8.1 23 0 23 0.000 22 0 22 8.2 33 0 33 0.000 33 0 33 8.3 39 0 39 0.000 37 0 37 10.1 2 21 23 0.913 2 17 19 10.2 13 45 58 0.776 13 33 46 10.3 11 7 18 0.389 11 7 18 12.1 3 7 10 0.700 3 7 10 12.2 12 12 24 0.500 12 5 17 12.3 42 12 54 0.222 42 10 52 13.1 10 37 47 0.787 8 26 34 13.2 51 23 74 0.311 23 37 60 13.3 22 14 36 0.389 32 11 43 14.1 10 0 10 0.000 8 0 8 14.2 20 0 20 0.000 19 0 19 14.3 27 1 28 0.036 27 0 27 15.1 43 17 60 0.283 39 2 41 15.2 46 10 56 0.179 43 3 46 15.3 17 6 23 0.261 29 0 29 17.1 24 13 37 0.351 14 6 20 17.2 30 2 32 0.063 30 2 32 17.3 43 8 51 0.157 44 1 45 23.1 6 11 17 0.647 5 10 15 23.2 33 19 52 0.365 34 8 42 23.3 45 8 53 0.151 45 2 47 36.1 12 5 17 0.294 12 0 12 36.2 26 8 34 0.235 24 0 24 36.3 12 1 13 0.077 12 0 12 225

CASE [0-1 freq [ng-1 [0-1 *0- [0-1 freq [n-1 *ng- tokens (*0-1

1.1 0.038 1 9 10 0.900 9 1.2 0.086 1 4 5 0.800 15 1.3 0.000 5 0 5 0.000 18 2.1 0.250 0 2 2 * 0 2.2 0.105 1 3 4 0.750 2 2.3 0.015 3 4 0.750 16 5.1 0.000 1 2 3 0.667 5 5.2 0.065 29 2 31 0.065 10 5.3 0.014 2 0 2 0.000 38 6.1 0.818 0 1 1 * 0 6.2 0.600 * ** * 35 6.3 0.200 ** * * 0 7.1 0.500 0 1 1 * 0 7.2 0.250 1 5 6 0.833 24 7.3 0.087 0 1 1 * 8 8.1 0.000 3 0 3 0.000 20 8.2 0.000 ** * * 7 8.3 0.000 2 0 2 0.000 21 10.1 0.895 * ** * 0 10.2 0.717 0 3 3 1.000 9 10.3 0.389 *** 8.000 9 12.1 0.700 * * ** 8 12.2 0.294 0 7 7 1.000 16 12.3 0.192 1 3 4 0.750 32 13.1 0.765 1 8 9 0.889 3 13.2 0.617 0 15 15 1.000 28 13.3 0.256 *** * 3 14.1 0.000 2 0 2 0.000 3 14.2 0.000 ** ** 27 14.3 0.000 * *** 14 15.1 0.049 1 14 15 0.933 8 15.2 0.065 0 10 10 1.000 9 15.3 0.000 0 6 6 1.000 12 17.1 0.300 10 7 17 0.412 1 17.2 0.063 ** * 8.000 10 17.3 0.022 8 7 15 0.467 14 23.1 0.667 **** 1 23.2 0.190 1 11 12 0.917 10 23.3 0.043 *** * 21 36.1 0.000 0 5 5 * 12 36.2 0.000 2 8 10 0.800 7 36.3 0.000 0 1 1 * 7 226

CASE Il-l *n— Il-l freq [k'-l (h-1 *k- (h-1 freq tokens tokens

1.1 38 47 0.809 10 0 10 0.000 1.2 18 33 0.545 0 21 21 0.000 1.3 44 62 0.710 20 G 20 0.000 2.1 20 20 1.000 10 6 16 0.375 2.2 18 20 0.900 8 3 11 0.273 2.3 49 65 0.754 19 2 21 0.095 5.1 19 24 0.792 13 0 13 0.000 5.2 25 35 0.686 0 20 20 1.000 5.3 20 58 0.345 48 0 48 0.000 6.1 14 14 1.000 7 3 10 0.300 6.2 10 45 0.222 10 8 18 0.444 6.3 25 25 1.000 22 4 26 0.154 7.1 14 14 1.000 2 11 13 0.846 7.2 13 37 0.351 39 9 48 0.188 7.3 10 18 0.556 10 0 10 0.000 8.1 6 26 0.231 16 0 16 0.000 8.2 4 11 0.364 17 0 17 0.000 8.3 1 22 0.045 14 0 14 0.000 10.1 17 17 1.000 5 25 30 0.833 10.2 20 29 0.690 13 7 20 0.350 10.3 16 25 0.640 9 5 14 0.357 12.1 8 16 0.500 4 10 14 0.714 12.2 11 27 0.407 4 5 9 0.556 12.3 21 53 0.396 39 6 45 0.133 13.1 37 40 0.925 4 36 40 0.900 13.2 55 83 0.663 3 12 15 0.800 13.3 10 13 0.769 6 1 7 0.143 14.1 4 7 0.571 7 0 7 0.000 14.2 20 47 0.426 15 0 15 0.000 14.3 15 29 0.517 7 0 7 0.000 15.1 21 29 0.724 27 0 27 0.000 15.2 25 34 0.735 6 0 6 0.000 15.3 10 22 0.455 6 0 6 0.000 17.1 35 36 0.972 14 14 28 0.500 17.2 39 49 0.796 25 18 43 0.419 17.3 37 51 0.725 54 1 55 0.018 23.1 19 20 0.950 9 20 29 0.690 23.2 14 24 0.583 15 4 19 0.211 23.3 11 32 0.344 19 1 20 0.050 36.1 5 17 0.294 12 0 12 1.000 36.2 24 31 0.774 21 2 23 0.087 36.3 10 17 0.588 **** A P P E N D I X F Group Statistical Results (Data Base 2: Speech Register)

227 228 Results for Speech Register: Descriptive Data

reg NN*MEAN MEDIANTRMEAN [0-]freq(*ng) 1 14 0 0.3558 0.2750 0.3406 2 14 0 0.2180 0.0955 0.1945 3 14 0 0.0870 0.0188 0.0691 [0-]freq(*0) 1 7 7 0.543 0.667 0.543 2 8 6 0.827 0.958 0.827 3 6 8 0.369 0.233 0.369 /1/freq 1 14 0 . 0.8048 0.9375 0.8364 2 14 0 0.5815 0.6230 0.5849 3 14 0 0.5603 0.5721 0.5666 /h/freq 1 14 0 0.3684 0.3375 0.3548 2 14 0 0.3743 0.3115 0.3533 3 13 1 0.0731 0.0182 0.0540

reg STDEV SEMEANMINMAX Q1 Q3 [0-jjfreq(*ng) 1 0.3533 0.0944 0.0000 0.8947 0.0000 0.7162 2 0.2489 0.0665 0.0000 0.7174 0.0469 0.3705 3 0.1234 0.0330 0.0000 0.3889 0.0000 0.1942 [0-]freq(*0-) 1 0.413 0.156 0.0000 0.933 0.000 0.900 2 0.318 0.113 0.065 1.000 0.808 1.000 3 0.438 0.179 0.0000 1.000 0.000 0.812 [l-]freq . 1 0.2639 0.0705 0.2308 1.0000 0.6860 1.0000 2 0.2006 0.0536 0.2222 0.9000 0.3962 0.7450 3 0.2369 0.0633 0.0455 1.0000 0.3834 0.7326 [h-Jfreq 1 0.3706 0.0991 0.0000 0.9000 0.0000 0.7438 2 0.3548 0.0948 0.0000 1.0000 0.0000 0.6167 3 0.1048 0.0291 0.0000 0.3571 0.0000 0.1381

Results for Speech Register: Arial;ysis of Variance Tests

Analysis of Variance for lO-l Frequency (*ng-) Factor IVpe Levels Values reg fixed 3 1 2 3

Analysis of Variance for [0-] Frequency (*ng-) Source DF SS MS F P reg 2 0.50575 0.25287 3.75 0.032 Error 39 2.62639 0.06734 Total 41 3.13213 MEANS

reg N [0-]freq(*ng-) 1 14 0.35581 2 14 0.21798 3 14 0.08705 229 Analysis of VoDcianoe far (0-1 Frequency (*0-)

Factor Type Levels Values reg fixed 3 1 2 3 Analysis of Variance for [*0-1 /O-/ Frequency

Source SS MS F P reg 2 0.7532 0.3766 2.52 0.109 Error 18 2.6929 0.1496 Total 20 3.4461

MEANS

reg N [0-]freq(*0-) 1 7 0.54295 2 8 0.82681 3 6 0.36944

Analysis of Variance for [1-1 Frequency

Factor Type Levels Values reg fixed 3 1 2 3 Analysis of Variance for [1] frequency

Source DF SS MSF P reg 2 0.51387 0.25693 4.64 0.016 Error 39 2.15798 0.05533 Total 41 2.67185

MEANS

reg N [1-1 freq 1 14 0.80485 2 14 0.58152 3 14 0.56032 230

Analysis of Variance foe [h-] Frequency

Factor Type Levels Values reg fixed 3 1 2 3 Analysis of Variance for [h-] frequency Source DP SS MS F P reg 2 0.78981 0.39491 4.22 0.022 Error 38 3.55458 0.09354 Total 40 4.34440

MEANS

reg N th] freq 1 14 0.36843 2 14 0.37426 3 13 0.07312 APPENDIX G List of Individual Tokens: Romanization, Characters and Parts of Speech

231 232

Token List for ng-/0-

By initial *ng

[ng-] Realizations

iLtm dihng*ngahk*jou ‘Elder’s Quorum' n mm giu*ngouh (8) ‘arrogant’ sv ji.ngoih (15) ‘beyond a point' adv ling*ngoih (18) ‘another’ n lihng*ngoih*yâtgo (11) ‘another one’ n mh*ngâam (11) ‘incorrect’ sv mh»ngâam*ge ‘incorrect sv *i@^ mh*ngok ‘not vicious’ sv #&# ngaa*gwok*syù (7) ‘Book of Jacob’ n bS ngâahn (7) ‘eyes' n ngâahn*gwông (2) ‘foresight' n m$È ngâahn*géng ‘eyeglasses’ n R&B# ngâahn*jing ‘in one’s eyes' n ngàahn*sik ‘color’ . n *Ojg ngâak (9) ‘deceive’ V *# ngâam (37) ‘correct’ sv ngâam*ngâam (14) ‘just now' sv ngâam»saai ‘all correct' sv ngâauh ‘bite, gnaw' V ngàh (2) ‘tooth’ n ngàh*chîm ‘toothpick’ n m ngàhn ‘silver’ n *°S.9l ngahp*tâuh (5) ‘nod head' V ngai»hihm (3) ‘dangerous’ a4i ngai*seuht (5) ‘art’ n ngàuh (12) “cattle' n ngâuh*jeuhng (2) ‘idol’ n ngàuhopèih (3) ‘leather’ n ngàuh*yuhk . (8) ‘beef n 233

ngoh (1301) ‘lorme* pro ngôhsdeih (574) ‘we or us' pro ngôh*ge ‘mine* pro mm ngôh*gwok ‘Russia’ n m m A ngôh*gwok*yàhn (2) ‘Russian* n mx ngôhamàhn (2) ‘Russian* pro affi ngôh*mùhn ‘we or us (written)* pro ngôhng*gwai ‘expensive* sv ngoih*gà (2) ‘wife's family* n ^m ngoih*gwok (22) 'abroad* n ngoih»gwok«yàhn (6) ‘foreigner* n ngoih*gwok*yàhn ‘foreigner* n ngoih*syùn ‘a daughter's son* n ngoih»yeuhng ‘abroad* pro M ngok (5) ‘vicious* sv ngûk*jyû ‘landlord* n pôuh*touh*ngàh (2) ‘Portugal* n pôuh*touh*ngàh*wâ (2) ‘Portuguese* n ##^A pôuh«tôuh«ngàh*yàhn (4) ‘Portuguese* n sihn*ngohk ‘good and evil* n gpm^ Dahng Ngàhn Siu (name)‘a name* n ## yàm*ngohk (4) ‘music* n em# yàm*ngohk*wûih ‘concert* n yâuh*ngoi ‘be affectionate* sv

[0-] realizations

aa*gwok« syu ‘Jacob* n m àahn (5) ‘eyes* n àahn*siu (3) ‘given name* n DJë âak (4) ‘deceive* V dJbA âakyahn (4) ‘deceive people* V âam (22) ‘correct* V âam*âam (22) ‘just now* adv âam*sàn«dî ‘well fitted* sv mn àhn«hôhng ‘bank* n ài*hlm«sing ‘dangerousness* n 234 mm aih*seuht (2) ‘art’ n e àuh ‘cattle' n àuhajeuhng ‘idol’ n f & àuhvpèih (4) ‘leather’ n 4 ^ # àuh*si (5) ‘cow manure’ n che«ok ‘wicked’ sv m m gîu*ouh ‘arrogant’ sv ji*oih ‘besides’ adv Uhng«oih (5) ‘beyond a point' adv mh«oi ‘not to want' V ôh (396) ‘I, me’ pro ôhadeih (83) ‘we, us’ pro oih«gwok (15) ‘abroad’ n oih«gwok*yàhn (4) ‘foreigner’ n oih«yàhn (2) ‘stranger’ n m ok (4) ‘vicious’ sv mx ô*màhn (2) ‘Russian’ n Dahng Àhn.Siu ‘a name' n mm uk*leunhng ‘ceiling beam’ n yih*‘oi ‘besides’ adv

By initial *0-

[ng-1 realizations

AiSlHl Daaih«bouh Ngâau 'TaiPo Hallow’ n bat*ngôn ‘unsettled’ V bô#ngôn#sih ‘village name' n bô*ngôn*yuhn (7) ‘district name’ n bôeungôn (4) ‘village name’ n m m bûnengük (2) ‘move house’ n ching*ôn ‘pay respects’ adv daap*ngôn ‘answer’ V dùng»nga ‘’ n 235

f6ng*ngûk ‘room' n mm geuiangûk (3) ‘residence' n màhn«ngaih ‘art’ n i# Y # nàahm«ngâ«dôu (1) (2) ‘Lamma Island' n ngâak*saù ‘shake hands' v(vo) ngaap 'duck' n m ngaat (2) ‘pressure' n m:h ngaat*lihk (5) ‘pressure' n m ngâi (4) ‘to be short' sv ngâi«di (2) ‘shorter' sv ngâisjô ‘get shorter' sv gk ngâu (2) ‘a surname' n W:m ngâu*jàu (2) ‘Europe' n ngoi (24) ‘love' n ngoi*jî (2) ‘beloved son' n ngoi*sâm (5) ‘love' n ngo*jâi ‘goose’ n ngôn (6 ) ‘according to' V ngôn*chyuhn (6 ) ‘safe’ sv ngôn*dihng ( 2 ) ‘to settle' V ngôn*laahp (3) ‘to establish’ V ngônelohk ‘peace &happiness' sv ngôn«pàaih (5) ‘to arrange' V ngôn«sàu (3) ‘lay on hands' V ngôn*wai ‘to comfort' V m n ngou*mùhn ( 2 ) ‘Macau' n m ngûk ( 1 0 ) ‘house' n ngük#kei (3) ‘home' n ngük*leuhng ‘ceiling beam’ n pihngangon ( 2 ) ‘peace' sv lêlM tùhng*nguk ‘roommate' n

[0-] realizations

âakesàu ‘shake hands' v aa«lùhn ‘Aaron’ n 236

aan«jau*fàahn ‘lunch' n âap (4) ‘duck' n aaUlihk ‘pressure' n m n àhn*hôhng ‘bank' n m âi ‘short' sv àî«di ‘shorter' sv Ôi*kahp ‘Egypt’ n a»jàu (8) ‘Asia' n am ‘dark' sv an*jau (5) ‘afternoon' n(tw) âu ‘a surname' n nlM àu (3) ‘to vomit' V gcitti âu«jàu (8) ‘Europe’ n àu«sei ‘vomit profusely' V bât«ôn ‘unsettled' V bôuaôn (2) ‘village name' n ± #[W Daaih«bouh Âau Taipo Hallow' n fS îS ^ Gâan‘Oi Gà ‘a name' n *Fh1M gâanengük ‘house' n muhk*ûk (7) ‘wood houses' n I# Y # nàahm*â«dôu (2) ‘Lamma Island' n g oi (10) ‘love' n oi*gwok*yàhn ‘foreigner' n ok ‘vicious' sv m ôn ‘to press' V ôn«chyuhn ‘to be safe' sv ôn«dî ‘later' sv ôn*geui ‘according to ’ v(cv) $ # ôn*heuhng (2) ‘serene' V ôn*jihng (5) ‘quiet' sv ôn*kèih (3) ‘a given name' n ^iL ôn«laahp (7) ‘establish’ V ôn*lohk ‘enjoy' sv $ * oùapàaih (27) ‘to arrange' V ôn«sâu (2) ‘lay on hands’ V ôn«sâu*lâîh (2) ceremony of n laying on of hands’ ôn»wai (3) ‘to comfort' V ôn»wan (2) ‘stable' adv f ê # ôn*ying ‘pitch a tent' V 237

S in ou*muhn (18) ‘Macau* n pîhng«ngôn (2) ‘peace’ adv sihk«ôn ‘eat lunch' n *|5]M tùhng*ük ‘roommate' n ngûk (13) ‘house' n ùk«chyün (4) ‘a village' n ük*kéi (43) ‘home’ n ùk«kéiyàhn (7) ‘family’ n ük*leuhng (3) ‘ceiling beam’ n 238

Token List for ng-/0-: Arranged by part of speech

By initial *ng-

[ng-1 R ealizations

ngàam*ngâam (14) 'just now' adv ji*ngoih (15) 'beyond a point' n Daaih*bouh Ngaau 'Tai Po Hallow' n(pl) z t m dihng«ngâk*jôu 'Elders Quorum' n lihng«ngoih (18) 'another' n lihng*ngoih*yâtgo (11) 'another one' n #&* ngaa*gwok*syu (7) 'Book of Jacob' n m ngàahn (9) 'eyes' n mit ngâahn*gwông (2) 'foresight' n É0# ngâahnajing 'in one's eyes' n BSBf ngâahn*jing 'eye' n Mfe ngàahn*sik 'color' n ngàh (2) 'tooth' n ngàh»chim 'toothpick' n m ngàhn 'silver' n mm ngai«seuht (5) 'art' n ngàuh (12) 'cattle' n ngâuh«jeuhng (2) 'idol' n ngàuh*pèih (3) 'leather' n f 1^ ngàuh*yuhk (8) 'beef n mm ngoh*gwok 'Russia' n mmA ngôh»gwok*yàhn (2) 'Russian' n mic ngôh*màhn (2) 'Russian' n ngoihagà (2) 'wife's family' n n^m ngoih*gwok (22) 'abroad' n ngoih*gwok*yàhn (6) 'foreigner' n ngoih*yeuhng 'abroad' n ngoih*syün a daughter's son' n 239

ngûk«jyù 'landlord' n pôuh«touh*ngàh (2) Tortugal' n pôuh*touh*ngàh*wâ (2) 'Portuguese' n pôuh*tôuh*ngàh»y àhn (4) 'Portuguese' n sihn*ngohk 'good and evil' n Dahng Ngàhn Siu 'a name' n yàm»ngohk (4) 'music' n yàm*ngohk*wûih 'concert' n ngoh (1301) 'I or me' pro *#4% ngôh*deih (574) 'we or us' pro ngôh*ge 'mine' pro mn ngôh»mùhn 'we or us' pro mm gîu*ngouh (8) 'arrogant' sv ♦Bgoê mh*ngâam (11) 'incorrect' sv ihh«ngok ‘not vicious' sv ngaiehihm (3) ■ 'dangerous' sv mh»ngâam»ge 'incorrect sv ngàam (37) 'correct' sv ngâam«saai 'all correct' sv ngôhngegwai 'expensive' sv ngok (7) 'vicious' sv yâuh*.ngoi 'be affectionate' sv BJÊ ngàak (9) 'deceive' V ngàauh 'bite, gnaw' V ngahp»tàuh (5) 'nod head' v(vo)

[0-1 realizations

âameâam (22) 'just now' adv jUoih 'besides' adv lihng*oih (5) 'beyond a point' adv m m yih«oi 'besides' adv aa»gwok*syxi 'Jacob' n aahn (5) 'eyes' n àahn*siu (3) 'given name' n 240 m if àhn*hôhng "bank" n àih«hîhm«slng 'dangerousness' n mm aihoseuht (2) 'art' n àuh 'cattle' n âuh*jeuhng 'idol' n àuh*pèih (4) 'leather' n àuh«si (5) 'cow manure' n oih*gwok (15) 'abroad' n oih*gwok«yàhn (4) 'foreigner' n ^ A oih*yàhn (2) 'stranger' n o*màhn (2) 'Russian' n Dahng Àhn.Siu 'a name' n *mm ük*leuhng ‘ceiling beam’ n % ôh (396) I, me' pro ôh«deîh (83) 'we, us' pro âam*sàn«di 'well fitted' sv che»ok 'wicked' sv a m gîu*ngouh 'arrogant' sv ok (4) 'vicious' sv * # âak 'deceive' V âam (13) 'correct' V ihhaoi 'to not want' V

By initial *0-

[ng-1 realizations

ching«on ‘to pay a visit’ adv bô«ngôn«sih 'village name' n bôn*gôn»yuhn (7) 'district name' n bô*ngôn (4) 'village name' n bün*ngûk (2) 'move house' n dûngenga 'East Asia' n * B m fôngengük 'room' n geui«ngük (3) ‘residence’ n màhn«ngaih 'art' n 241 i# Y # naahmngadou (2) ‘Lamma Island’ n(pl) A ngaap 'duck' n m ngaat (2) 'pressure' n mil ngaat*lihk (5) 'pressure' n ngâu (2) 'a surname' n m m ngâu«jàu (2) 'Europe' n # ngoi (24) 'love' n ngoi'ji (2) 'beloved son' n ngoi«sâm (5) 'love' n ngo*jai 'goose' n ngou»mùhn (2) 'Macau' n ngûk (10) 'house' n ngük*kéi (3) 'home' n ngük*leuhng ‘ceiling beam’ n *|ë]Jl tùhng*ngûk 'roommate' n ngâi (4) 'to be short' sv ngâi«dî (2) 'shorter' sv , ngâi*jô 'get shorter' sv ngôn»chyuhn (6 ) 'safe' sv ngônl*ohk 'peace& happiness' sv pîhng*ngôn (2) 'peace' sv % ôn 'to press' v(cv) ôn*geui ‘according to’ v(cv) bât«ngôn 'unsettled' V daap»ngôn 'answer' V *a^ ^ ngâak«sàu 'shake hands' v(vo) fê ngôn (6 ) 'according to' v(cv) ngôn*dîhng (2) 'to settle' V ^iL ngôneiaahp (3) ‘to establish’ V $ c # ngôn*pàaih (5) 'to arrange' V ngôn*sàu (3) 'lay on hands' V ngôn*wai 'to comfort' V 242

[0-] realizations

6n*wan (2) 'stable' adv pihng*ngon (2) 'peace' adv ikR 0i»kahp ‘Eygpt’ n (pl) aa«lùhn 'Aaron' n aan*jau«fàahn 'lunch' n % âap (4) 'duck' n mf3 aat*lîhk 'pressure' n m n àhn«hôhng 'bank' n a*jàu (8) 'Asia' n *%m an»jau (5) 'afternoon' n âu 'a surname' n W:m âu«jàu (8) 'Europe' n bôu«ôn (2) 'village name' n daaih*bouh Aauh 'Taipo Hallow' n (pl) Gâan‘OiGàh 'a name' n mm gaan«ûk 'house' n *^Ji muhkeük (7) 'wood houses' n m y # nàahm«â«dôu (2) 'Lamma Island' n & oi (10) 'love' n oi«gwok*yàhn 'foreigner' n ôn*kèih (3) 'a given name' n mm ou«jàu (9) 'Australia' n mn ou«mùhn (18) 'Macau' n sihk«on 'eat lunch' n tùhng*ük 'roommate' n * ! i ûk (13) 'house' n *11 tt ùk*chyùn (4) 'a village' n ükvkéi (43) 'home' n *f. ùk*kéi*yàhn (7) 'family' n *f ûkvleuhng (3) ‘ceiling beam’ n *«# âam (9) 'correct' sv ài 'short' sv * # = 6 ^ àîadï 'shorter' sv âm 'dark' sv 243

ok 'vicious' sv on*chyuhn 'to be safe' sv on«di 'later' sv on*jihng (5) 'quiet' sv ôn*lohk 'enjoy' sv âak (3) 'to deceive' V âak*sâu 'shake hands' V *#A âak*yàhn (4) 'cheat someone' V Dg àu (3) 'to vomit' V âu*sei 'vomit profusely' V bât«ôn 'unsettled' V $ # on*heuhng (2) 'serene' V ônl«aahp (7) ‘to establish’ V ôn*paaih (27) 'to arrange' V ôns*âu (2) 'lay on hands’ V % f# g ôn*sâu»lâih (2) 'ceremony laying V on of hands.' ôn*wai (3) 'to comfort' V %# on*ying 'pitch a tent' V 244

Token List for variable n-/I-

*n- intitial words

[n-] realizations

AM bâat*ôgh»nîhn T985 (*85)’ n chêut*nihn ’next year* n mm faahn«nôuh ’be troubled’ sv gàm«nîhn (5) ’this year’ adv gauh»nîhn ’last year' adv géi*nihn (3) *a few years’ n hô«nàhng (14) ’possibly’ adv i î i t hôu*noih ’a long time’ adv jài*néuih (2) ’children’ n ji«noih ’within a time’ adv mm kwan«nàahn (9) ’difficulty’ sv léuhnganihn (2) ’two years’ n mmm mâhn«nàahm»wâ (4) ’Min Dialect’ n nàahm (2) ’male’ n i# nàahm ’south’ n(pl) nàahm«fbng ’south’ n(pl) nàahtn«jâi (4) ’boy’ n nàahm*yàhn (2 ) ’man’ n m nàahn (3) ’difficult, hard’ sv nàhng*gau (32) ’able’ V nàhngdihk ’ability’ n *# nam (3) ’think’ V nâm«dou ’think of rv nâm*jyuh (2) ’think carefully’ V % né (9) ’question particle’ part f); néih (193) ’you’ pro néih*deih (23) ’you’ (plural) pro fT'ffl néih*muhn (2) ’you’ (plural, reading) pro * néuih (3) ’female, daughter’ n néuih*jâi 'girl’ n -k^ néuih*sing ’female’ n k A néuih*yàhn ’woman’ n 245

ngàuh»yuhk *béef n ngh*nihn (3) 'five years’ n ni«bihn 'this side' dem nl*d! (94) 'these' dem nUdouh (31) 'this place, here' dem nUdyun (2) 'this (as w/ a para.)' dem ni«gaan (4) 'this (as w/ a build.)' dem nl«gauh 'this piece' dem nî*géi (4) 'these few' dem nî*géilihn 'these few years' dem . nî«gihn (8) 'this (as w/ clothing)' dem nï*go (193) 'this (general)' dem nihn (73) 'year' n nîhn*chmg«yàhn (2) 'young people' n nîhn«gei (2) 'age' n nl.jit (2) 'this (as w\ a passage)' dem nûjuhng (2) 'this kind' dem nl*keui 'this area' dem nI*leuhng»go (4) 'these two' dem nï«sau 'this (as w/ a poem) dem dem *% Ea# nI«seigo 'these four' ni*syu 'this place, here' dem dem ♦ 9 ê -B nl*yat*yaht (3) ‘this one day’ ni*yeuhng (7) 'this [thing]' dem noih (5) 'long time' sv n i m noih'sing 'patience' nôuh*lihk 'mental capacity' n sahp*nihn 'ten years' n n mix: syûn«néuih (8) 'granddaughter' n iS^m yàhn*noih 'patience* yât*nihn 'one year' n — yât»nîhnkâp (2) 'first year’ jienoih 'within a time period’ adv yiht*naauh (4) 'bustling' sv m m yih«noih (2) 'within a time period' adv 246

[1-] Realizations

A# bâat*lihn 'eight years' adv bât*làhng (3) 'not able' V châat*lîhn (4) 'seven years' n - k A # châat#bâat#lïhn (2) 'seven or eight years' n châat»ling*lihn '1970 ('70)' n chêut«lîhn 'next year' n izii: daaih*léuih 'eldest daughter' n daih*yât«lihn (2) 'first year' n fùheléuîh (3) 'woman' n gàm«lihn . 'this year n gauh«lihn (5) 'last year' n géialîhn (2) 'which year' n gd*lihn (2) 'that year' n fc&SS gù«ièuhng (3) 'young woman' n hôî*làahm«dôu (2) 'Hainan Island' n hé*làhng (77) 'possible' adv * * # hôu*loih (4) 'long time' adv jàialéuih (20) 'children' n jàW éuîh (3) 'niece' n jieloîh (3) 'within' j ph joih»loih 'included' V làahm (6) 'male' n iSSB làahm*bouh (5) 'southern part' n làahm*ching«lihn (2) 'young people' n (#5% làahm«ging 'Nanjing (city)' n(pl) iWiS làahm*héi (2) 'South China Sea' n(pl) * ^ 1 f làahm«jâî (3) 'boy' n làahm*jàt 'nephew' n m - k làahm<»léuih (3) 'male and female' n m - k n . làahm«léuihching 'young men/women' n # làahm*nga«dôu (5) 'Lamma Island' n(pl) làahm«sing (5) 'male' n mm làahmosyün 'grandson' n m A laahm»yahn (5) 'man' n m A # làahm»yàhn»gâai 'street of men' (coined) n (pl) 247

m làahn (67) 'difficult' sv m ië làahn«gwo 'sad' sv IS laauh (10) 'scold' V m # lahng*gau (74) 'able' v(av) làhng«lihk (6) 'effort' n làhng*inh«làhng«gau 'able or not?' V làm (113) 'to think' V làhm 'tender' sv làm«dou (9) 'thought' V (rv) lâm«faan 'remember' V *-g&& T làm*faan«hâh (2) 'to think back a little' V *s±& lâm*faat 'way of thinking' V * i± A lâm«gwo (11) 'thought of V ♦ t s t T lâmahàh (22) 'think a little' V * Bièffi lâm*jyuh (11) 'think carefully' V tÊ: làm*yât*làin (2) 'think a little' V f g * Lèîh.fei (3) 'Nephi' (name) n fT' léih (415) 'you' pro léih«deih (22) 'you' (plural) pro léuhng*lihn (5) 'two years' n(tw) -k léuih (26) 'female, daughter' n k léuih ’girl' n k W léuih«ching (11) 'young woman' n k ^ léuih*ching«lihn (16) 'young woman' n k % léuîh«gà (2) 'woman's family' n kif léuîhvjâi (18) •girl' n léuîh«jât 'niece' n léuih«pàhng«yaûh (2) 'girl' n k\^ léuih*sing (4) 'female' n léuîh«yàhn (8) 'woman' n *% léuih*yîh (2) 'daughter n ll«bâ (3) 'this (as with keys)' dem li«bâan 'this (as with groups)' dem lîodi (59) 'these' dem li*douh (17) 'this place, here' dem lî«gâan (2) 'this (as w/ buildings)' dem li*géi 'these few' dem *%Ï4: lî«gihn (3) 'this (as w/clothing)' dem *%# li*go (133) 'this (general)' dem # lîhn (89) 'year' n 248

lihnabun (3) 'a year and half dem lihnochlng (7) 'young' n #-#A lihn*chlng*yàhn (2) 'young people' n lihn«gei (4) 'age' n Uhn*gei daaih ' old, elderly' n lîhn*hèng 'young' sv lihn*kâp ’grade level’ n lihn*méih 'end of year' n (tw) *%# lï»sâu 'this (as w/ poems)' dem li*wâi 'this [person]' hon.' dem lî*yàt«fbng»mihn 'this aspect' dem lûyeuhng (3) 'this [thing]' dem * it loih (28) 'long [time]' sv m iâ loîh*bihn 'within' loc * it 0^ loih*di (3) 'longer' sv loih*sam (2) 'within one's heart' loc ®t1£ loih*sîng 'patience' n 88 lôuh 'brain' n mil Iduhvlihk (8) 'effort' n luhkalihn (4) 'six years' n m # lùhng*chyûn 'farming village' n lùhng*fù • (4) 'farmer' n

1^ ■ lyûhn 'warm' sv mh«làhng*gau (7) 'not able' V i ¥ ngh»lihn 'five years' n + A#- sahptbâaUlihn 'eighteen years' n + ¥ sahpalîhn (4) 'ten years' n H— 4p sahp*yât*Uhn 'eleven years' n sei«nhn (3) 'four years' n sihng*Ubn 'adult' n 'b% siu«lihn 'youth, adolecent' n U-k syûn*leuih 'granddaughter' n S.®t yâhn»loih (2) 'patience' n - ¥ yât*lîhn ( 1 0 ) 'one year' n yat*lihn*kâp 'first year,first grade' n % k yîh*léuih (5) 'children' n mm yihWaauh (8) 'bustling' sv Mi# Yuht*làahm (4) "Vietnam" n 249

*n- intitial words: Arranged by word class

hô*nàhng (14) 'possibly' adv hou*noih 'a long time' adv ji*noih 'within a time' adv mm yih*noih (3) 'within a period' adv nl«bihn 'this side' dem ni«di (94) 'these' dem ni*dpuh (31) 'this place, here' dem ni*dyun (2) 'this (as w/a para)' dem *ojë.m nivgaan (4) 'this (as w/ a build.)' dem ni*géi (4) 'these few' dem ni»géi*lîhn 'these few years' dem ni*gîhn (8) 'this (as w/ clothing)' dem nï*go (193) 'this (general)' dem ni*jit (2) 'this (as w/a passage)' dem *%# nîejuhng (2) 'this kind' dem *%ig nîakèuih 'this area' dem nî«léuhng*go (4) 'these two' dem *%# ni«sàu 'this (as w/ a poem) dem *%Eg# nî«sei«go 'these four' dem *%]& nl*syu 'this place, here' dem *%— 0 nUyâtyaht (3) 'this day' dem *%# nl*yeuhng (7) 'this [thing]' dem bâat«ngh*nîhn '1985 C85)' n chëut»nîhn 'next year' n m # géî*nihn (3) 'a few years' n 4 ^ # gàm«nihn (5) 'this year' n m # gauh*nihn 'last year' n if jâi*néuih (2) 'children' n mahn«naàhm*wâ (4) 'Min Dialect' n i# naahm 'south' n(pl) nàahm (2) 'male' n nàahm«fbng 'south' n(pl) * ^ i f nàahm*jàî (4) 'boy' n 250

nàahmeyàhn (2) ’man’ n nàhng«lihk •ability’ n -k néuih (3) ’female, daughter* n néuih*jài 'girl' n - k ^ néuih*sing ’female’ n k \ néuih«yàhn ’woman’ n ngàuh«yuhk ’beef n ngh«nîhn (3) ’five years’ n ¥ nihn (73) ’year’ n #@A mhn»chîng*yahn (2) ’young people’ n nîhnagéi (2) 'age' n i t i i noih#sing ’patience’ n nôuhdihk ’mental capacity’ n sahp»nihn ’ten years’ n syùn«néuih (8) ’granddaughter’ n S i t yâhn*noih ’patience’ n -# yât»nihn 'one year’ n yât*nîhn*kâp (2) ’first year’ n % né (9) ’quest.part.’ part f); néih (193) ’you’ pro ifs^ ^ néihvdeih (23) ’you' (plural) pro néih«muhn (2) ’you'(plural,lit) pro mm faahn«nôuh •be troubled’ sv mm kwaan»nàahn (9) ’difficulty' sv m nàahn (3) ’difficult, hard’ sv * it noih (5) ’long time’ sv m m yiht*naauh (4) "bustling' sv %» nàhngvgau (32) 'able' V nàm (3) 'think' V nàm«dou 'think of V (rv) nâmajyuh (2) 'think carefully’ V

[1-] realizations

hô»làhng (77) ’possible’ adv hôu*loih (4) ’long time’ adv 251 z m ji«loih (3) 'within' adv(ph) li*ba (3) 'this (as with keys)' dem ll«baan 'this (as with groups)' dem ll*di (59) 'these' dem lI*douh (17) 'this place, here' dem li*gaan (2) 'this (as w/ buildings)' dem liogei 'these few' dem llg'ihn (3) 'this (as w/clothing)' dem * % # li*go (133) 'this (general)' dem *%# li«sau 'this (as w/ poems)' dem *%{Ê li*wai 'this [person]' hon.' dem li»yat»fong»mihn 'this aspect' dem *%# ll*yeuhng (3) 'this [thing]' dem l^'L' loih*sam (2) 'within one's heart' loc A# bàatalîhn 'eight years' n - t ¥ châat»lîhn (4) 'seven years' n chaat*ling«lîhn '1970 C70)' n -tA¥ châat«baat«lihn (2) 'seven or eight years' n * iti¥ cheut*lîhn 'next year' n i^~k daaihvléuîh 'eldest daughter' n m - % daih*yât*lîhn (2) 'first year' n M i c fû*léuih (3) 'woman' n gàm»lihn 'this year’ n gauh«llhn (5) 'last year' n m # géi«llhn (2) 'which year' n *## gôalîhn (2) 'that year' n ([6# gù«leùhng (3) 'young woman' n # i # # hôi«làahm«dôu (2) 'Hainan Island' n * if * jai*léuih (20) 'children' n m - k jât*léuih (3) 'niece' n # làahm (6) 'male' n I# @5 Iàahm«bouh (5) 'southern part' n làahm*ching»lihn (2) 'young people' n làahm«gîng 'Nanjing (city)' n(pl) làahm»hôi (2) 'South China Sea' n(pl) * s e làahm*jâi (3) 'boy' n làahm«jât 'nephew' n làahm*léuih (3) 'male and female' n làahm«léuih*chîng 'youngmen\women' n 252

^ Y # làahm*nga»dôu (5) 'Lamma Island' n (pl) làahm«sing (5) 'male' n làahm»syùn 'grandson' n làahm*yàhn (5) 'man' n #A# làahm»yàhn*gâai 'street of men' n(pl) m :h làhng«lihk (6) 'effort' n Lèih*fèi (3) 'Nephi'(name) n léuhng«lihn (7) 'two years' n(tw) A léuih (24) 'female, daughter' n -k léuih (4) ’girl' n . léuih»ching (11) 'young woman' n léuih«ching*lîhn (16) 'young woman' n k % léuihegà (2) 'woman's family' n * ^ 1 f léuihajài (18) ’girl' n léuih*jat 'niece' n léuih*pàhng*yaûh (2) ’girl' n léuih*sing (4) 'female' n ^ A léuih*yàhn (6) 'woman' n k % léuih»yih (2) 'girl, daughter' n # lîhn (89) 'year' n ## lîhn*bun (3) 'a year and half n lihn»ching (7) 'young' n #@A lihn«ching*yàhn (2) 'young people' n #% lihn»géi (4) ’age' n lîhn*géi*dàaih 'how old?' n lihn»meih 'end of year' n ## loih»sing 'patience' n ig louh 'brain' n léuhelihk (8) 'effort' n A% luhk»lîhn (4) 'six years’ n â # lühng*chyûn 'farming village' n )ê A lûhng*fu (4) 'farmer' n &4F ngh*lîhn 'five years' n sahp*bâat*lihn 'eighteen years' n sahpalîhn (4) 'ten years' n -1-- sahp*yât»lîhn 'eleven years' n m # sei*lîhn (3) 'four years' n sîhng«lîhn 'adult' n 253

/J'# siu«lihn 'youth, adolecent' n a * syûn*léuih 'granddaughter' n yâhn»loih (2) 'patience' n yâUlîhn (10) 'one year' n yât*lihn*kap (2) 'first grade' n yih*léui (5) 'children' n Mi# YuhUlàahm (4) 'Vietnam' n léih (415) 'you' pro léihadeih (22) 'you' (plural) pro m làahn (67) 'difficult' sv làahn«gwo •sad' sv làhm 'tender' sv lihn*hing 'young' sv *i‘t loih (28) 'long [time]' sv *## loihadi (3). 'longer' sv 1^ lyûhn 'warm' sv mm yihW aauh (8) 'bustling' sv T-m bâUlàhng (3) 'notable' V joih'loih 'included' V m laauh (10) 'scold' V ## làhng*gau (74) 'able' v(av) làhng«mh*làhng*gau 'able or not?' v(av) *g± làm (113) 'to think' v * # ; i i làm#dou (9) 'thought' v(rv) làm*faan 'remember' V lam«faan*hah (2) 'think back V *ttii làmefaat 'way of thinking' V lâm*gwo (11) 'thought of V *ft^r làmehah (22) 'think a little' V làmvjyuh (11) 'think carefully' V làm*yàt*làm (2) 'think a little' V *D§m^ ihh*làhng*gau (7) 'not able' V APPENDIX H List Chinese Characters for Place Names

254 255

List of Chinese Characters for Place Names

Bao’an Longdu

Bobai Ifâ Lutian Sffl Chaozhou mm Macau (Aumen) mn Conghua Nanlang Dongguan Panyu

Doumen 4-M Qianshan Bill Enping m ? Qingxi mm Foshan BiLi Rongxian Gaoming Shangheng ±.m Gaozhou if'>H Shantou iilisl

Guangzhou (Canton) (rhK) Shunde m m Haifeng mm. Sanshui Heshan m m Shenzhen mm Huaxian Taishan éiii

Hong Kong (City) # # (rhU) , Wuzhou fêm

Xinhui if# Hong Kong m r r ) (New Territories) Yinzhou K m Huizhou m m Yulin

Jianmen KM Zengcheng

Lianzhou m m Zhongshan (Shiqi) (;E%) Lufeng mm Zhuhai Lutian igffl Kaiping

Leizhou LIST OF REFERENCES

The titles for Chinese language sources are given in Pinyin romanization, followed by a translation in square brackets []. When an author is know by a romanized form of his or her name I have used that form otherwise I have put it in pinyin also.

American Chamber of Commerce in Hong Kong. 1986. Living in Hong Kong. Edited by Alan Moores. Hong Kong: Cameron Printing Company.

Barale, Catherine. 1982. A Quantitative Analysis of the Loss of Final in Beijing Mandarin. Ph.D. Dissertation. University of Pennsylvania.

Bauer, Robert. 1982. Cantonese Sociolinguistic Patterns. Ph.D. Dissertation. University of , Berkeley.

Bauer, Robert. 1983. Cantonese Sound Change Across Subgroups of the Hong Kong Speech Community. Journal of Chinese Linguistics. 11.2:301-354. Bauer, Robert. 1986. The Microhistory of Sound Change in Hong Kong Cantonese. Journal of Chinese Linguistics. 14.1:1-41. Beijing Daxue, 1962. Hanyu Fangyan Zihui. [Chinese Dialect Syllabary] Beijing: Wenzi Gaige Chubanshe. Beijing Daxue. 1964. Hanyu Fangyan Cihui [Chinese Dialect Survey]. Beijing: Wenzi Gaige Chubanshe. Biber, Douglas. 1986. Spoken and Written Textual Dimensions in English. Language 62.2:384-414.

Bloomfield, Leonard. 1933. Language. New York: Holt, Rinehart and Winston, Inc. Boissevain, J. 1974. Friends of Friends: Networks, Manipulators and Coalitions. Oxford, Blackwell. Boissevain, J . and J.C. Mitchell (eds.) 1973. Network Analysis : Studies in Human Interaction. The Hague : Mouton.

256 257

Bourgerie, Dana S. 1987a. PARTICLES OF UNCERTAINTY: A Discourse Approach to the Cantonese Final Particle. M.A. Thesis. The Ohio State University. Bourgerie, Dana S. 1987b. A Case of Sociolinguistic Variation in Cantonese. The Ohio State University. Unpublished Paper.

Butler, Christopher. 1985. Statistics in Linguistics. Oxford: Basil Blackwell Inc. Cameron, Deborah and Jenifer Coates. 1985. Some Problems in the Sociolinguistic Explanation of Sex Differences. Language and Communication, Vol. 5. no. 3, 143-51. Cedergren, Henrietta and David Sankoff.1974. Variable Rules: Performance as a statistical reflection of competence. Language 50: 333-355. Chalmers, J. 1907. English-Cantonese Dictionary, 7th ed.

Chan, Marjorie K.M. 1987. Post-Stopped Nasals in Chinese: An Areal Study. UCLA Working Papers in Phonetics 68:73-119. Chan, Marjorie K.M. 1985. Fuzhou Phonology: A Non-Linear Analysis of Tone and Stress. Ph.D. Dissertation, University of Washington. Chan, Marjorie K.M. 1980. Zhongshan Phonology: A Synchronic and Diachronic Analysis of a Yue (Cantonese) Dialect. M.A. Thesis, University of British Columbia. Chan, Marjorie K.M. and James H.Y. Tai. 1989. A Critical . Review of Jerry Norman's Chinese. Journal of the Chinese Language Teacher's Association 24.1:43-61 Chan, Mimi and Helen Kwok. 1982. A Study of Lexical Borrowing from English in Hong Kong Chinese. Centre of Asian Studies, University of Hong Kong.

Chan, Mimi and Helen Kwok. 1985. A Study of Lexical Borrowing From Chinese into English With Special Reference To Hong Kong. Hong Kong : Centre of Asian Studies. Occasional Papers and Monographs, No.62. Chao, Yuen Ren. 1947. Cantonese Primer. Cambridge: Harvard University Press. Chao, Yuen Ren. 1930. A System of Tone Letters. Le Maitre Phonétique, troisième serie, 30, 24— 27. Reprinted in Fangyan 1980, 2, 81-83. 258

Chao, Yuen Ren. 1968. A Grammar of Spoken Chinese. Berkeley: University of California Press. Chen, Mathew and Hsin-I Hsieh. 1971. The Time Variable in Phonological Change. Journal of Linguistics. 7:1-14.

Chen, Mathew and William S.Y. Wang. 1975. Sound Change: Actuation and Implementation. Language 51:255-81

Cheung, Kwan-hin. 1986. The Phonology of Present-Day Cantonese. Ph.D. Dissertation, University College.

Cheung, Samuel Hung-nin 1972. Xianggang Yueyu Yufa Yanjiu [A Study of Hong Kong Cantonese Grammar]. Hong Kong: University of Hong Kong.

Coates, Jenifer. 1986. Women Men and Language. London: Longman. DeFrancais, John. 1984. The Chinese Language: Fact and Fantasy. Honolulu: University of Hawaii Press.

Cowles, Roy T. 1965. The Cantonese Speaker's Dictionary. Hong Kong: University of Hong Kong Press. Di, Shiyu. 1986. Hanyu Fangyan Yu Fangyan Diaocha [Chinese Dialects and Dialect Study]. Zhongxing: Xinan Shifan Daxue Chubanshe. Fasold, Ralph. (1978). Language Variation and Linguistic Competence. In David Sankoff (ed.), 85-96. Gao, Huanian. 1984. Guangzhou Fangyan Yanjiu [A Study of the Cantonese Dialect]. Hong Kong: Commercial Press. Gibbons, John. 1987. Code Mixing and Code Choice: a Hong Kong Case Study. Clevedon: Multilingual Press. Guojia Tongjiju [National Statistical Bureau]. 1984. Zhongguo Tongji Nianjian [China Statistical Yearbook]. Beijing: Zhongguo Tongji Chubanshe [China Statistical Publishing House]. Guy, Gregory. 1980. Variation in the Group and Individual: the case of final stop deletion. In Labov (ed.), 1-36. Hashimoto, Oi-kan Yue Ann. 1972. Studies in Yue Dialects 1: Phonology of Cantonese. Cambridge: Cambridge University Press. Hockett, Charles F. 1958. A Course in Modern Linguistics. New York: MacMillan. Hockett, Charles P. 1950. Age-Grading and Linguistic Continuity. Language 26:449-59. p.16. 259

Hong Kong 1976 By-Census, Main Report. Table 103: Total Population by Place of Birth by Age (single) by Sex. Hong Kong: Census and Statistics Department. Hong Kong Population and Housing Census. 1981. Basic Tables. Hong Kong: Census Statistics Department.

Horvath, Barbara M. 1985. Variation in Australian English. Cambridge: Cambridge University Press.

Hsieh, Hsin-I. 1972. Lexical Diffusion: Evidence from Child Language Acquisition. Glossa 6:89-104. Hu, Mingyang. 1987. Beijing Nuguoyin Diaocha [An Investigation of Women's Speech in Pekingese]. Yuwen Jianshe [Language Contruction], no.l.

Huang, Parker Po-fei. 1970. Cantonese Dictionary: Cantonese -English^ English Cantonese. New Haven: Yale University Press.

Hudson, R.A. 1980. Sociolinguistics. Cambridge: Cambridge University Press.

Kerswill, P. 1984. Levels of Linguistic Variation in Durham. In Cambridge Papers in Phonetics and Experimental Linguistics 3. Department of Linguistics, University of Cambridge.

•Kwok, Helen. 1984. Sentence Particles in Cantonese. Centre of Asian Studies, University of Hong Kong. Labov, William, (ed.) 1980. Locating language in Time and Space. New York: Academic Press. Labov, William. 1966. The Social Stratification of English in New York City. Washington, D.C.: Center for Applied Linguistics.

Labov, William. 1972a. Language in the Inner City. Philadelphia: Pennsylvania University Press; Oxford: Blackwell.

Labov, William. 1972b. Sociolinguxstic Patterns. Philadelphia: University of Pennsylvania Press.

Labov, William. 1990. The Search for Lexical Diffusion: Computational Analysis of Geographic Data. Presentation at the 2nd Northeast Conference on Chinese Linguistics (NECCL2). Philadelphia, PA. May 4-6, 1990. Lau, Sidney. 1977. A Practical Cantonese-English Dictionary. Hong Kong: The Government Printer. 260

Lee, Thomas. 1983. The Vowel System of Two Varieties of Cantonese. UCLA working Papers in Phonetics. 57: 97- 114. Li, Fang-keui. 1939. Languages and Dialects of China. The Chinese Year Book 1938-1939. Shanghai: Commercial Press, pp. 44-46. (Reprinted in Journal of Chinese Linguistics (1979) 4: 287-293.

Li, Rongming. 1959. Chaozhou Fangyan. Beijing: Zhonghua Shuju.

Liao, Chiu-Chung. 1976. The Propagation of Sound Change: A Case Study in Chinese Dialects. Ph.D. Dissertation. University of California, Berkeley.

Light, Timothy. 1982. On Being De-ing: How Women's Language is Percieved in Chinese. In Computational Analyes of African and Asian Languages, no. 19, 21-49. Luke, Kang-kwong. 1988. A Conversation Analytic Approach to the Study of Utterance Particles in Cantonese. Ph.D. Disseration. York University.

Macau Government. 1962. Dictionario Chines-Portuges. Macau: Edicao Do Governo Da Provincia.

Macaulay, R.K.S. 1977. Language Social Class and Education. Edinburgh: Edinburgh UP. Meyer, Bernhard P. and Theodore P. Wempe. 1947. The Student’s Cantonese-Englsih Dictionary, 3rd ed. New York: Field Afar Press.

Milroy, Leslie. 1980. Language in Social Networks. Oxford: Basil Blackwell.

Milroy, Leslie. 1987. Observing & Analyzing Natural Language Data. Oxford: Basil Blackwell. Mitchell, J. Clyde (ed.) 1969. Social Networks in Urban Situations. Manchester: Manchester University Press. Mitchell, J.Clyde. 1973. Networks, Norms and Institutions. In Boissevain and Mitchell (eds),15-36. Neu, H. (1980). Ranking of Constraints on /t.d/ Deletion in American English: a statistical analysis. In Labov (ed.), 37-54.

Norberg, B. 1980. Sociolinguistic Fieldwork Experiences of the Unit for Advanced Studies in Modern Swedish. FUMS Report no. 90. Uppsala: FUMS. 261

Norman, Jerry. 1988. Chinese. Cambridge: Cambridge University Press. Pan, Peter G. n.d. Patterns of Phonological Variations in Hong Kong Cantonese Speech.

Pan, Peter G. 1981. Prestige Forms and Phonological Vatiation in Hong Kong Cantonese Speech. M.A. Thesis. University of Hong Kong.

Qiao, Yannong (ed.) 1966. Guangzhouhua Kouyaci de Yanjiu [A Study of Cantonese Colloquial Vocabulary]. Hong Kong: Overseas Chinese Languages Publishing Company

Rao, Bingcai. 1980. Yue Fangyan Ziyinde Dingyin Wenti. Yuwen Zazhi. 5:42-45.

Romaine, Suzanne. 1978. Post-vocalic /r/ in Scottish English: sound change in progress. In Trudgill (ed.), 144-57. Romaine, Suzanne. 1980. A Critical Overview of the Methodology of Urban British Sociolinguistics. English World Wide 1, 2. 163-98.

Ryan, F. Barbara, et al. 1985. Minitab. Boston: PWS-Kent Publishing Company.

Sanders, Robert. 1986. Diversity and Frequency as a Reflection of Social Factors: The Application of Variable Rules to the Analysis of Disposal in the Beijing Speech Community. Ph.D. Dissertation. University of California, Berkeley.

Sankoff, David, (ed.) 1978. Linguistic Variation: models and methods. New York: Academic Press. Sankoff, Gilliam. 1980. The Social Life of a Language. Philadelphia: University of Pennsylvania Press. Sankoff, Gilliam. 1980. A Quantitative Paradigm for the Study of Communicative Competence. In G. Sankoff (1980), 47-79. Sargent, Laurent. 1982. Phonologie du Dialecte Hakka de Sung Him Tong. Hong Kong: Chiu Ming Publishing Co. Ltd.

She, Ping Zhao. 1982. Tongyin Zihui [A Syllabary of Homophones]. Hong Kong: Guanghua Tushu Chuban Gongsi [Guanghua Book Publishing Company].

Shuy, Roger W. , Walter A. Wollfram and William K. Riley 1968. Field Techniques in an Urban Language Study. Arlington: Center for Applied Linguistics. 262

T'sou, Benjamin K. 1976. Language Loyalty among Minority Groups in Hong Kong. Papers from the Asian Round Table Conference on Chinese Language and Linguistics, Hong Kong, Benjamin K. T'sou, ed. Thomason, Sarah Grey and Terrence Kaufman. 1988. Language Contact, Creolization, and Genetic Linguistics. Berkeley: University of California Press.

Trudgill, Peter. 1974. The Social Differentiation of English in Norwich. Cambridge: Cambridge University Press.

Walters, Keith. 1989a. Social Change and Linguistic Variation in Korba, A Small Tunisian Town. Ph.D. Dissertation, University of Texas at Austin.

Walters, Keith. 1989b. Women, Men and Linguistic Variation in the Arab World. Presentation at The Third Annual Symposium on Arabic Linguistics. Salt Lake City. March 1989. Wang, William S-Y and Chin-chuan Cheng. 1970. Implementation of Phonological Change: the Shuang-feng Chinese Case. Proceedings from the Annual Meeting of the Linguistic Society, 6: 552-9. Wang, William S-Y. 1969. Competing Changes as a Cause of Residue. Language. 45.1:9-25.

Wells, H.R. 1931. An English Cantonese Dictionary. Hong Kong: Kelly and Walsh, LTD.

Whitaker, K. P. K. 1952. Characterization of the Cantonese Dialect with Special Reference to its Modified Tones. Ph.D. Dissertation, University of London. Williams, S. Wells. 1856. Tonic Dictionary in the Cantonese Dialect. Canton: Office of Chinese Repository. Wolfson, Nessa. 1982. CHP: The Conversational Historic Present in American English Narrative. Dordrecht: Foris publications. Wong, S.L. 1987. A Chinese Syllabary Pronounced According to the Dialect of Hong Kong. Hong Kong: Zhonghua Shuju. Woods, Anthony, et al. 1986. Statistics in Language Studies. University of Reading: Cambridge University Press.

Yang, Shi-feng. 1984. Sichuan Fangyan Diaocha Baogao. [Report on a survey of the Dialects of Sichuan] 2 Volumes. Special Publication Number 82, Institute of History and Philology, Academia Sinica. 263

Yeung, Suk-Wah Helen. 1980. Some Aspects of Phonological Variations in the Cantonese Spoken in Hong Kong. M.A. Thesis. University of Hong Kong. Yuan, Jiahua. 1983. Hanyu Fangyan Gaiyao. [Outline of Chinese Dialects] Beijing: Wenzi Gaige Chubanshe.

Yun, Weili. 1987. Hainan Fangyan. [Hainan Dialect]. Macau: University of East Asia.

Zhan, Bohui and Yat-Shing Cheung, (eds.) 1988. A Survey of Dialects in the Pearl River Delta, Vol. 2. Hong Kong: New Century Publishing House.

Zhan, Bohui and Yat-Shing Cheung, (eds.) 1987. A Survey of Dialects in the Pearl River Delta, Vol. 1. Hong Kong: New Century Publishing House. Zhan, Bohui. 1980. Xindai Hanyu Fangyan [Modern Chinese Dialects]. Hubei, China: Commercial Press.