English Corpus Linguistics: an Introduction - Charles F

Cambridge University Press 0521808790 - English Corpus Linguistics: An Introduction - Charles F. Meyer Index More information Index Aarts, Bas, 4, 102 Biber, Douglas, et al. (1999) 14 adequacy, 2–3, 10–11 Birmingham Corpus, 15, 142 age, 49–50 Blachman, Edward, 76–7 Altenberg, Bengt, 26–7 BNC see British National Corpus AMALGAM Tagging Project, 86–7, 89 Brill, Eric, 86 American National Corpus, 24, 84, 142 Brill Tagger, 86–8 American Publishing House for the Blind British National Corpus (BNC), 143 Corpus, 17, 142 annotation, 84 analyzing a corpus, 100 composition, 18, 31t, 34, 36, 38, 40–1, 49 determining suitability, 103–7, 107t copyright, 139–40 exploring a corpus, 123–4 planning, 30–2, 33, 43, 51, 138 extracting information: defining parameters, record keeping, 66 107–9; coding and recording, 109–14, research using, 15, 36 112t; locating relevant constructions, speech samples, 59 114–19, 116f, 118f tagging, 87 framing research question, 101–3 time-frame, 45 future prospects, 140–1 British National Corpus (BNC) Sampler, see also pseudo-titles (corpus analysis 139–40, 143 case study); statistical analysis Brown Corpus, xii, 1, 143 anaphors, 97 genre variation, 18 annotation, 98–9 length, 32 future prospects, 140 research using, 6, 9–10, 12, 42, 98, 103 grammatical markup, 81 sampling methodology, 44 parsing, 91–6, 98, 140 tagging, 87, 90 part-of-speech markup, 81 time-frame, 45 structural markup, 68–9, 81–6 see also FROWN (Freiburg–Brown) tagging, 86–91, 97–8, 111, 117–18, 140 Corpus types, 81 Burges, Jen, 52 appositions, 42, 98 Burnard, Lou, 19, 82, 84, 85–6 see also pseudo-titles (corpus analysis case study) Cambridge International Corpus, 15, 143 ARCHER (A Representative Corpus of Cambridge Learners’ Corpus, 15, 143 English Historical Registers), 21, Canterbury Project, 79, 143 22, 79 n6, 140, 142 Cantos, Pascual, 33 n2 Aston, Guy, 19 Chafe, Wallace, 3, 32, 52, 72, 85 AUTASYS Tagger, 87 CHAT system (Codes for the Human Analysis of Transcripts), 26, 113–14 Bankof English Corpus, 15, 96, 142 Chemnitz Corpus, 23, 143 BBC English Dictionary, 15, 16–17 CHILDES (Child Language Data Exchange Bell, Alan, 100, 101–3, 104, 108, 110, 131 System) Corpus, xiii, 26, 113, 144 Bergen Corpus of London Teenage English Chomsky, Noam, 2, 3 see COLT Corpus CIA (contrastive interlanguage analysis), 26 Biber, Douglas, 10, 19–20, 22, 32, 33, CLAN software programs, 26 36, 39–40, 41, 42, 52, 78, 121, CLAWS tagger, 25, 87, 89–90 122, 126 Coates, Jennifer, 12, 13 162 © Cambridge University Press www.cambridge.org Cambridge University Press 0521808790 - English Corpus Linguistics: An Introduction - Charles F. Meyer Index More information Index 163 collecting data Corpus of Middle English Prose and Verse, 144 general considerations, 55–6 Corpus of Spoken Professional English, 71, record keeping, 64–6 144 speech samples, 56; broadcasts, 61; corpus-based research, 11 future prospects, 139; microphones, 60; contrastive analysis, 22–4 “natural” speech, 56–8, 59; permission, grammatical studies, 11–13 57; problems, 60–1; recording, 58–9; historical linguistics, 20–2 sample length, 57–8; tape recorders, 59–60 language acquisition, 26–7 writing samples: copyright, 38, 61–2, 79 n6, language pedagogy, 27–8 139–40; electronic texts, 63–4; future language variation, 17–20 prospects, 139; sources, 62–4 lexicography, 14–17 see also sampling methodology limitations, 124 Collins, Peter, xii–xiii natural language processing (NLP), xiii, Collins COBUILD English Dictionary,15 24–6 Collins COBUILD Project, 14, 15 reference grammars, 13–14 COLT Corpus (Bergen Corpus of London translation theory, 22–4 Teenage English), xiii–xiv, 18, 49, 142 Crowdy, Steve, 43, 59 competence vs. performance, 4 Curme, G., 13 computerizing data directory structure, 67, 68f data-driven learning, 27–8 file format, 66–7 de Haan, Pieter, 97–8 markup, 67, 68–9 see also annotation descriptive adequacy, 2, 3 speech, see speech, computerizing diachronic corpora, 46 written texts, 78–80, 139 dialect variation, 51–2 concordancing programs dictionaries, 14–17 KWIC format, 115–16, 116f Du Bois, John, 32, 52, 85 for language learning, 27–8 Dunning, Ted, 132 “lemma” searches, 116 programs, 115, 117, 150–1 EAGLES Project see Expert Advisory Group with tagged or parsed corpus, 117–18 on Language Engineering Standards, The uses, 16, 86, 114 Ebeling, Jarle, 23 “wild card” searches, 116–17 education, 50 Conrad, Susan, 126 Ehlich, Konrad, 77 contrastive analysis, 22–4 Electronic Beowulf, The, 21, 144 contrastive interlanguage analysis (CIA), 26 electronic texts, 63–4 Cook, Guy, 72, 86 elliptical coordination copyright, 38, 44, 57, 61–2, 79 n6, 139–40 frequency, 7, 12 Corpora Discussion List, 144 functional analysis, 6–11 corpus (corpora) genres, 6, 9–10 balanced, xii position, 6–7 construction see planning corpus repetition in speech, 9 construction serial position effect, 7–8, 8t definitions, xi–xii speech vs. writing, 8–9 diachronic, 46 suspense effect, 7–8, 8t historical, 20–2, 37–8, 46, 51, 78–9 empty categories, 4–5 learner, 26–7 ENGCG Parser, 96 monitor, 15 EngCG-2 tagger, 88 multi-purpose, 36 EngFDG parser, 91, 93–4, 93–4 n8, 96 parallel, 22–4 English–Norwegian Parallel Corpus, 23, parsed, 96 62, 144 resources, 142–9 ethnographic information, 65–6 special-purpose, 36 see also sociolinguistic variables synchronic, 45–6 Expert Advisory Group on Language corpus linguistics, xi, xiii–xiv, 1–2, 3–4 Engineering Standards, The (EAGLES), Corpus of Early English Correspondence, 22, xi, 84, 144 37, 144 explanatory adequacy, 2, 3, 10–11 © Cambridge University Press www.cambridge.org Cambridge University Press 0521808790 - English Corpus Linguistics: An Introduction - Charles F. Meyer Index More information 164 Index Extensible Markup Language see XML ICE (International Corpus of English), 146 Eyes, Elizabeth, 91 annotation, 82–3, 84, 85, 87, 90 composition, 34, 35t, 36, 38, 39, 40–2, 104 Fernquest, Jon, 114 computerizing data, 72, 73 Fillmore, Charles, 4, 17 copyright, 38, 44 Finegan, Edward, 22 criteria, 50 Fletcher, P., 121–2 record keeping, 66 FLOB (Freiburg–Lancaster–Oslo–Bergen) regional components, 104, 105–6, 106t, Corpus, 21, 45, 145 110, 123, 124 “frame” semantics, 17 research using, 6, 9 see also pseudo-titles Francis, W. Nelson, 1, 88 (corpus analysis case study) FROWN (Freiburg–Brown) Corpus, 21, 145 sampling, 44, 56 FTF see fuzzy tree fragments time-frame, 45 functional descriptions of language see also ICECUP; ICE-GB; ICE-USA elliptical coordination, 6–11, 8t,12 ICE Markup Assistant, 85, 86 repetition in speech, 9 ICE Tree, 95 voice, 5–6 ICECUP (ICE Corpus Utility Program), 19, fuzzy tree fragments (FTF), 119, 119f 116, 119, 146 ICE-East Africa, 106, 106t, 107t, 110, Gadsby, Adam, 27 123t, 124 Garside, Roger, 88–9 ICE-GB, 146 Gavioli, Laura, 28 annotation, 25, 83–4, 86, 92–3, 92f, 96, gender, 18, 22, 48–9 117–18, 118f, 140 generative grammar, 1, 3–5 composition, 106t genre variation, 18, 19–20, 31t, 34–8, 35t, 40–2 computerizing data, 73 Gillard, Patrick, 27 criteria, 50 government and binding theory, 4–5 record keeping, 64–5 grammar research using, 14, 19, 115–16, 116f generative, 1, 3–5 see also pseudo-titles (corpus analysis universal, 2–3 case study) “Grammar Safari”, 28 ICE-Jamaica, 106t, 107t, 110, 123t grammars, reference, 13–14 ICE-New Zealand, 106, 106t, 107t, 123t, grammatical markup see parsers 125, 130–3 grammatical studies, 11–13 ICE-Philippines, 106, 106t, 107t, 110, 123t, Granger, Sylvianne, 26 125, 130–3 Greenbaum, Sidney, 7, 14, 22, 35t, 64, 75, 95 ICE-Singapore, 106t, 110, 123t Greene, B. B., 87, 88 ICE-USA composition, 53, 106t Haegeman, Lilliane, 2–3, 4–5, 6 computerizing data, 70, 71, 73–4, 79 Hasselg˚ard, Hilde, 23 copyright, 62 Helsinki Corpus, 145 criteria, 46–7 composition, 20–1, 38 directory structure, 67–8, 68f planning, 46 length, 32–3 research using, 22, 37, 51 record keeping, 64, 65 symbols system, 67 research using see pseudo-titles (corpus Helsinki Corpus of Older Scots, 145 analysis case study) historical corpora, 20–2, 37–8, 46, 51, 78–9 sampling, 58, 60–1 see also ARCHER; Helsinki Corpus ICLE see International Corpus of Learner Hofland, Knut, 23 English Hong Kong University of Science and Ingegneri, Dominique, 42–3 Technology (HKUST) Learner Corpus, International Corpus of English see ICE 26, 145 International Corpus of Learner English Hughes, A., 121–2 (ICLE), 26, 27, 146 ICAME Bibliography, 145 Jespersen, Otto, xii, 13 ICAME CD-ROM, 67, 145 Johansson, Stig, 23 © Cambridge University Press www.cambridge.org Cambridge University Press 0521808790 - English Corpus Linguistics: An Introduction - Charles F. Meyer Index More information Index 165 Kalton, Graham, 43 London–Lund Corpus, 147 Kennedy, Graeme, 89 annotation, 82 Kettemann, Bernhard, 27–8 composition, 53 Kirk, John, 52 names in, 75 Kolhapur Corpus of Indian English, 104 research using, 12, 19, 39, 42, 98 Kretzschmar, William A., Jr., 42–3 Longman Dictionary of American English,15 Kucera, Henry, 1 Longman Dictionary of Contemporary KWIC (key word in context), 115–16, 116f English,15 Kyt, M., 37 Longman Essential Activator,27 Kyt, Merja, 42 Longman–Lancaster Corpus, 12, 148 Longman Learner’s Corpus, 26, 27, 148 Labov, W., 9 Longman Spoken and Written English Corpus, Lampeter Corpus, 38, 146 The (LSWE), 14, 90, 148 Lancaster Corpus, 12, 147 LSWE see Longman Spoken and Written see also LOB (Lancaster–Oslo–Bergen) English Corpus, The Corpus Lancaster–Oslo–Bergen Corpus see LOB Mair, Christian, 45 (Lancaster–Oslo–Bergen) Corpus Map TaskCorpus, 59, 148 Lancaster Parsed Corpus, 91–2, 96, 147 markup, 67, 68–9 Lancaster/IBM

English Corpus Linguistics: an Introduction - Charles F

Talk Bank: a Multimodal Database of Communicative Interaction

MASC: the Manually Annotated Sub-Corpus of American English

Child Language

Multimedia Corpora (Media Encoding and Annotation) (Thomas Schmidt, Kjell Elenius, Paul Trilsbeek)

Conference Abstracts

Gold Standard Annotations for Preposition and Verb Sense With

Background and Context for CLASP

Informatics 1: Data & Analysis

The Expanding Horizons of Corpus Analysis

(Or, the Raising of Baby Mondegreen) Dissertation

Metapragmatics of Playful Speech Practices in Persian

Neuroinformatics.Pdf