
Creation and Analysis of a Corpus of Internet Relay Chat by Wernfrid Doell, B.A. A thesis submitted to: the Faculty of Graduate Studies and Research in partial filfilment of the requirements for the degree of Master of Arts School of Linguistics and Applied Language Studies Carleton University Ottawa, Ontario 1 5 May 2000 0 2000 National Library Bibliothèque nationale du Canada Acquisitions and Acquisitions et Biblbgraphic Services sewices bibliographiques The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or seii reproduire, prêter, distribuer ou copies of this thesis in microform, vendre des copies de cette thèse sous paper or electronic formats. la forme de microfiche/nlm, de reproduction sur papier ou sur format électronique. The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts kom it Ni la thèse ni des extraits substantiels may be p~tedor othenvise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation. TABLE OF CONTENTS CHAPTER 1 INTRODUCTION ............................................................. 1 CHAPTER 2 RESEARCH REVIEW .........................................................4 Technicai introduction ......................................................4 Ferrara et al . (1991) ........................................................9 Mumiy.D . E. (1991a) .....................................................11 Lundstrom. P . (1995) ...................................................... 12 Collot. M . & Belmore. N . (1996) .............................................14 Wilkins. H . (1991) ........................................................15 Bechar-lsraeli. H. (1995) ................................................... 17 Bays. H .(1 999) ......................................................... -18 Lunsford. W . (1 996) ...................................................... -19 Werry. C.C.(1996): .......................................................19 Paolillo. J . (1999) .........................................................21 Doell. W . (1997) .........................................................23 Doell.W.(1999) ......................................................... 24 Summary .............................................................. -25 CHAPTER 3 METHODOLOGY ............................................................27 Data collection and corpus organization ....................................... 27 iii Channel selection ........................................................-29 Privacy ................................................................. 31 Population .............................................................. 32 Study organization ....................................................... -33 CHAPTER 4 DESCRIPTIVE STATISTiCS ................................................... 35 Basics ..................................................................35 Correlations between variables .............................................. 39 100 most fiequent words ...................................................40 Correlations between the 100 most fiequent words ............................... 47 Frequency distributions in the 100 most fiequent words .......................... -49 Typeltoken ratios ......................................................... 50 ANOVA and post hoc tests .................................................51 Summary ............................................................... 55 CHAPER 5 ANALYSIS OF CIRC FEATURES COMPARABLE TO SPEECH AND TO WRiTiNG .... 56 Deictics ................................................................56 Personal Pronouns .................................................... 56 Demonstrative pronouns and determiners .................................. 71 'Here' ......................................................... -73 'This' .......................................................... 76 'That' .......................................................... 77 "ïhese' and 'those' ................................................ 78 'men' ........................................................ 79 Summary ........................................................... 80 Questions ...............................................................80 Nkquestions ........................................................ 81 Yesho questions and response items ......................................84 Summary ........................................................... 88 Hedging ................................................................88 Contractions ............................................................. 90 Contractions with hot' ................................................90 Contractions with 'have' ............................................... 94 Contractions with 'is' and 'has' ..........................................95 Contractions with 'will' ................................................96 Didects ................................................................98 Ellipsis ................................................................ 101 Colloquial Usage ........................................................ 102 Syntactical complexity .................................................... 104 Subordination and coordination ......................................... 104 Passives ........................................................... 106 Greeting and Leave-taking ................................................ -107 Greeting ........................................................... 107 Leave-taking ........................................................ 1 10 CHAPTER 6 ERSATZ w FEATURES OF IRC TALK WHICH ATTEMPT TO SIMULATE SPEECH ... 111 Emoticons ............................................................. Ill Actionmessages ......................................................... 115 Backchanneling ......................................................... 117 Phonetic spelling ........................................................119 Paralinguistic and suprasegmental cues ....................................... 121 CHAPTER 7 INNOVATION AND CREATtVITY ............................................ 123 Nickname morphism ..................................................... 123 ASCII art .............................................................. 126 Neologisms ............................................................ 127 Innovative spellings ......................................................129 Abbreviation ...........................................................130 CHAPTER 8 CONCLUSION ......l.....c.............................................t.. 136 Results of the study ...................................................... 136 Applications ............................................................ 138 Suggestions for Merresearch ............................................. 140 CHAPTER 9 APPENDIX ................................................................ 147 Appendix A: ............................................................ 147 Appendix B: ............................................................ 151 Appendix C: ............................................................153 Appendix D: ............................................................155 AppendixE: ............................................................ 156 Appendix F: ............................................................ 157 Appendix G: ........................................................... -160 Appendix H: ............................................................ 160 Appendix 1: ............................................................ 171 Appendix J: ............................................................ 172 ACKNO WLEDGEMENTS 1would like to thank Kassia Balian for her unwavering support for this project and the many hours she spent proofieading the different draft versions of this paper. Likewise, 1 would like to thank my supervisor Professor Helmut Zobl for his invaluable advice and directions. Further thanks go to Hillary Bays for sharing her enthusiasrn for IRC research with me, Pmfessor Leslie Siegrist for his technical trouble shooting of the wconcord software, Lynne Young and Richard Darville for their contributions in the early stages of the project, Janna Fox and Martin Kessner for their help with the statistical analysis, and Lola and Dude for their moral support. viii ABSTRACT This thesis reports the process of creation and the subsequent computer-aided analysis of a large- scale corpus of Intemet Relay Chat (henceforth IRC). The corpus created fiom log files of several IRC sessions contains one million running words collected in ten IRC chat rooms or channels. For the analysis of the corpus, concordancing software was used to extract words and their fiequencies. A statistical anaiysis of some basic measures obtained with the concordancer yielded insights into the intemal structure of the corpus. For example, the data fiom one channel, i.e. a subcorpus of 100,000 words, was found to be significantly direrent fiom the othea. influences from dirty data could also be pinpointed due to the statistical analysis. The analysis of the corpus concentrated on features of IRC which have been identified by prior
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages181 Page
-
File Size-