RESOURCES FOR CORPUS LINGUISTICS 1. (FIRST GENERATION) WRITTEN CORPORA Brown Corpus Properties: 1m words, written, American, 1961 Design: original (see Corpus Design handout) Availability: ICAME-CD, Free online access at LDC Lancaster-Oslo/Bergen (LOB) Properties: 1m words, written, British, 1961 Design: based on BROWN Availability: ICAME-CD Kolhapur Corpus of Indian English Properties: 1m words, Written, Indian, 1978 Design: roughly based on BROWN (less specific fiction, more general fiction) Availability: ICAME-CD Wellington Corpus of Written New Zealand English Properties: 1m words, written, NZ, 1986-90 Design: roughly based on BROWN (no subcategories within Fiction) Availability: ICAME-CD, Wellington Corpus CD Australian Corpus of English (ACE) Properties: 1m words, written, Australian, 1986 Design: roughly based on BROWN Availability: ICAME-CD, ACE-CD London-Lund (LLC) Properties: 0.5m words, Spoken, British, 1959ff Design: derived from SEU Availability: ICAME-CD Freiburg-Brown (FROWN) Properties: 1m words, Written, American, 1991 Design: based on BROWN Availability: ICAME-CD Freiburg-LOB (FLOB) Properties: 1m words, Written, British, 1991 Design: based on LOB (i.e. BROWN) Availability: ICAME-CD 2. MEGA CORPORA Cobuild Bank of English Properties: > 300m words, mostly written, mostly British (some American) Design: opportunistic Availability: HarperCollins, Online demo at Cobulid web site British National Corpus (BNC) Properties: 100m words, spoken (10m) and written (90m), 1990s Design: original (see Corpus Design handout) Availability: BNC CD, Online demo at BNC web site. Corpus Linguistics 1/3 © 2003 Anatol Stefanowitsch
[email protected] Resources for corpus linguistics 2/3 3. VARIOUS SPECIALIZED CORPORA Spoken Corpora Corpus of Spoken American English Properties: growing, spoken, American Design: free conversation among friends in natural setting, tape recorded by participants Availability: Free download at Talkbank Switchboard (SWB) Properties: ca.