List of Corpora and Databases

1296

List of corpora and databases

Links accessed in April 2012.

ARCHER = A Representative Corpus of Historical English Registers, version 3.1. 1990–93,

2002, 2007, 2010. Compiled under the supervision of Douglas Biber and Edward Finegan at

Northern Arizona University, University of Southern California, University of Freiburg,

University of Heidelberg, University of Helsinki, Uppsala University, University of

Michigan, University of Manchester, Lancaster University, University of Bamberg,

University of Zurich, University of Trier, University of Salford, and University of Santiago de Compostela. http://www.llc.manchester.ac.uk/research/projects/archer/.

B-Brown = B-Brown Corpus. In progress. English Department, University of Zurich. http://www.es.uzh.ch/Subsites/Projects/BBROWN.html.

BE06 = The British English 2006 corpus. 2008. Compiled by Paul Baker. Lancaster

University. http://www.helsinki.fi/varieng/CoRD/corpora/BE06/index.html.

BLOB-1901 = Lancaster-1901 Corpus. In progress. Compiled by Nick Smith, Paul Rayson, and Geoffrey Leech. Lancaster University.

BLOB-1931 = Lancaster-1931 Corpus. 2003–6. Compiled by Geoffrey Leech, Paul Rayson, and Nick Smith. Lancaster University. http://www.helsinki.fi/varieng/CoRD/corpora/BLOB-

1931/.

BoE = Bank of English Corpus (Cobuild Corpus). Distributed by Collins WordBanks

Online. http://collinslanguage.com/content-solutions/wordbanks.

BNC = The British National Corpus, version 3 (BNC XML Edition). 2007. Distributed by

Oxford University Computing Services on behalf of the BNC Consortium. http://www.natcorp.ox.ac.uk/. 1297

Brown = A Standard Corpus of Present-Day Edited American English, for use with Digital

Computers. 1964, 1971, 1979. Compiled by W. Nelson Francis and Henry Kuþera. Brown

University. http://www.helsinki.fi/varieng/CoRD/corpora/BROWN/.

BYU-BNC = BYU-BNC: The British National Corpus. 2004– . Interface by Mark Davies. http://corpus.byu.edu/bnc/.

CED = A Corpus of English Dialogues 1560–1760. 2006. Compiled under the supervision of

Merja Kytö (Uppsala University) and Jonathan Culpeper (Lancaster University). http://www.helsinki.fi/varieng/CoRD/corpora/CED/index.html.

CEEC = Corpus of Early English Correspondence. 1998. Compiled by Terttu Nevalainen,

Helena Raumolin-Brunberg, Jukka Keränen, Minna Nevala, Arja Nurmi, and Minna

Palander-Collin. Department of English, University of Helsinki. http://www.helsinki.fi/varieng/CoRD/corpora/CEEC/index.html.

CEECS = Corpus of Early English Correspondence Sampler. 1998. Compiled by Jukka

Keränen, Minna Nevala, Terttu Nevalainen, Arja Nurmi, Minna Palander-Collin, and Helena

Raumolin-Brunberg. Department of English, University of Helsinki. http://www.helsinki.fi/varieng/CoRD/corpora/CEEC/ceecs.html.

CEEM = Corpus of Early English Medical Writing. In progress. Compiled under the supervision of Irma Taavitsainen and Päivi Pahta. University of Helsinki. See MEMT and

EMEMT.

CHF = Corpus of Historical Fiction. 2010. Compiled by Bethany Gray. Northern Arizona

University.

CIE = A Corpus of Irish English. 2003. Compiled by Raymond Hickey. University of

Duisburg-Essen. http://www.uni-due.de/CP/CIE.htm. 1298

CLMETEV = The Corpus of Late Modern English Texts (Extended Version). 2006.

Compiled by Hendrik De Smet. Department of Linguistics, University of Leuven. http://www.helsinki.fi/varieng/CoRD/corpora/CLMETEV/.

CMSW = Corpus of Modern Scottish Writing. In progress. Principal investigator: John

Corbett. University of Glasgow. http://www.scottishcorpus.ac.uk/cmsw.

COCA = The Corpus of Contemporary American English. 2008– . Compiled by Mark

Davies. Brigham Young University. http://corpus.byu.edu/coca/.

COERP = Corpus of English Religious Prose. In progress. Compiled by Thomas Kohnen,

Tanja Rütten, Ingvilt Marcoe, Kirsten Gather, and Dorothee Groeger. University of Cologne. http://www.helsinki.fi/varieng/CoRD/corpora/COERP/.

COHA = Corpus of Historical American English. 2010– . Compiled by Mark Davies.

Brigham Young University. http://corpus.byu.edu/coha/.

CONCE = A Corpus of Nineteenth-Century English. 2000. Compiled by Merja Kytö

(Uppsala University) and Juhani Rudanko (University of Tampere).

COOEE = Corpus of Oz Early English. 2004. Compiled by Clemens Fritz. Free University of

Berlin. http://www.helsinki.fi/varieng/CoRD/corpora/COOEE/.

CoRD = Corpus Resource Database. 2007– . http://www.helsinki.fi/varieng/CoRD/index.html.

CSC = Corpus of Scottish Correspondence, 1500–1715. 2007. Compiled by Anneli

Meurman-Solin. University of Helsinki. http://www.helsinki.fi/varieng/CoRD/corpora/CSC/index.html.

DCPSE = Diachronic Corpus of Present-Day Spoken English. 2002–4. Compiled under the supervision of Bas Aarts. Survey of English Usage, University College London. http://www.ucl.ac.uk/english-usage/projects/dcpse/. 1299

DECTE = Diachronic Electronic Corpus of Tyneside English. In progress. Compiled under the supervision of Karen P. Corrigan. Newcastle University. http://research.ncl.ac.uk/decte/.

DOEC = Dictionary of Old English Corpus. Original release 1981 compiled by Angus

Cameron, Ashley Crandell Amos, Sharon Butler, and Antonette diPaolo Healey. Release

2009 compiled by Antonette diPaolo Healey, Joan Holland, Ian McDougall, and David

McDougall, with Xin Xiang. University of Toronto. http://www.helsinki.fi/varieng/CoRD/corpora/DOEC/index.html.

ECCO = Eighteenth-Century Collections Online. http://gale.cengage.co.uk/product- highlights/history/eighteenth-century-collections-online.aspx.

EEBO = Early English Books Online. http://eebo.chadwyck.com/home.

EMC = Corpus of Early Medieval Coin Finds and Sylloge of Coins of the British Isles databases (Fitzwilliam Museum, Cambridge). http://www.fitzmuseum.cam.ac.uk/coins/emc/.

EMEMT = Early Modern English Medical Texts. 2010. Compiled by Irma Taavitsainen

(University of Helsinki), Päivi Pahta (University of Tampere), Martti Mäkinen (Svenska handelshögskolan), Turo Hiltunen, Ville Marttila, Maura Ratia, Carla Suhr, and Jukka

Tyrkkö (University of Helsinki). http://www.helsinki.fi/varieng/CoRD/corpora/CEEM/EMEMTindex.html.

ESTC = English Short Title Catalogue. http://estc.bl.uk. eWAVE = The electronic World Atlas of Varieties of English. 2011. Edited by Bernd

Kortmann and Kerstin Lunkenheimer. Leipzig: Max Planck Institute for Evolutionary

Anthropology. http://www.ewave-atlas.org/.

FLOB/F-LOB = The Freiburg–LOB Corpus of British English. Original release 1999 compiled by Christian Mair (Albert-Ludwigs-Universität Freiburg). Release 2007 compiled 1300 by Christian Mair (Albert Ludwigs-Universität Freiburg) and Geoffrey Leech (University of

Lancaster). http://www.helsinki.fi/varieng/CoRD/corpora/FLOB/.

Frown = The Freiburg-Brown Corpus. Original release 1999 compiled by Christian Mair

(Albert-Ludwigs-Universität Freiburg). Release 2007 compiled by Christian Mair (Albert

Ludwigs-Universität Freiburg) and Geoffrey Leech (University of Lancaster). http://www.helsinki.fi/varieng/CoRD/corpora/FROWN/.

Google Books (American English) Corpus. 2011– . Compiled by Mark Davies. Brigham

Young University. http://googlebooks.byu.edu/.

The Gutenberg Archive. 2011. http://www.gutenberg.org/.

HC = Helsinki Corpus of English Texts. 1991. Compiled by Matti Rissanen (Project leader),

Merja Kytö (Project secretary); Leena Kahlas-Tarkka, Matti Kilpiö (Old English); Saara

Nevanlinna, Irma Taavitsainen (Middle English); Terttu Nevalainen, Helena Raumolin-

Brunberg (Early Modern English). Department of English, University of Helsinki. http://www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus/index.html.

HCOS = Helsinki Corpus of Older Scots. 1995. Compiled by Anneli Meurman-Solin.

Department of English, University of Helsinki. http://www.helsinki.fi/varieng/CoRD/corpora/HCOS/.

ICAME = International Computer Archive of Modern and Medieval English. http://icame.uib.no/.

ICE = The International Corpus of English, version 2. 2006. Coordinated by Gerald Nelson

(University of Hong Kong). http://ice-corpora.net/ice/index.htm.

L 1301

LAEME = A Linguistic Atlas of Early Middle English, 1150–1325. 2007. Compiled by

Margaret Laing and Roger Lass. University of Edinburgh. http://www.lel.ed.ac.uk/ihd/laeme1/laeme1.html.

LAMSAS = Linguistic Atlas of the Middle and South Atlantic States. http://us.english.uga.edu/lamsas/.

LAOS = A Linguistic Atlas of Older Scots, Phase 1: 1380–1500. 2008. Compiled by Keith

Williamson. University of Edinburgh. http://www.lel.ed.ac.uk/ihd/laos1/laos1.html.

Lion = Literature Online. http://lion.chadwyck.co.uk/.

LOB = The Lancaster-Oslo/Bergen Corpus, original version. 1970–78. Compiled by

Geoffrey Leech (Lancaster University), Stig Johansson (University of Oslo), and Knut

Hofland (University of Bergen). http://www.helsinki.fi/varieng/CoRD/corpora/LOB/.

London Lives: 1690–1800. Crime, Poverty and Social Policy in the Metropolis. Version 1.0.

1 September 2010. http://www.londonlives.org/.

LSWE = The Longman Spoken and Written English Corpus. http://www.pearsonlongman.com/dictionaries/corpus/.

MEG-C= The Middle English Grammar Corpus, version 2011.1. Compiled by Merja

Stenroos, Martti Mäkinen, Simon Horobin, and Jeremy J. Smith. University of Stavanger. http://www.uis.no/research/culture/the_middle_english_grammar_project/meg-c/.

MEMT = Middle English Medical Texts. 2005. Compiled by Irma Taavitsainen (University of Helsinki), Päivi Pahta (University of Tampere), and Martti Mäkinen (University of

Stavanger). http://www.helsinki.fi/varieng/CoRD/corpora/CEEM/MEMTindex.html.

N 1302

NECTE = The Newcastle Electronic Corpus of Tyneside English. 2005. Compiled by Karen

Corrigan (Newcastle University), Hermann Moisl (Newcastle University), and Joan Beal

(University of Sheffield). http://www.helsinki.fi/varieng/CoRD/corpora/NECTE/.

NECTE2 = The Newcastle Electronic Corpus of Tyneside English 2. In progress. Compiled

by Karen Corrigan. Newcastle University. http://www.research.ncl.ac.uk/necte2/.

NEET = Network of Eighteenth-century English Texts. 2007. Compiled by Susan

Fitzmaurice. University of Sheffield. http://sites.google.com/site/helontheweb/corpora.

NYT = Corpus of Historical Newspaper Writing (New York Times). 2010–11. Compiled by

Bethany Gray. Northern Arizona University.

OBC = Old Bailey Corpus. In progress. Compiled under the supervision of Magnus Huber.

University of Giessen. http://www.uni-giessen.de/oldbaileycorpus/index.php.

Old Bailey Online. Version 6.0. March 2011. http://www.oldbaileyonline.org/. ONZE = Origins of New Zealand English Corpus. In progress. Compiled by the ONZE project team. University of Canterbury. http://www.lacl.canterbury.ac.nz/onze/index.html.

PASE = Prosopography of Anglo-Saxon England. 2010. http://www.pase.ac.uk/index.html/.

PCEEC = The Parsed Corpus of Early English Correspondence. 2006. Annotated by Ann

Taylor, Arja Nurmi, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. University of York and University of Helsinki. Distributed through the Oxford Text Archive. http://www.helsinki.fi/varieng/CoRD/corpora/CEEC/pceec.html.

PPCEME = Penn-Helsinki Parsed Corpus of Early Modern English. 2004. Compiled by

Anthony Kroch, Beatrice Santorini, and Ariel Diertani. University of Pennsylvania.

http://www.ling.upenn.edu/hist-corpora/PPCEME-RELEASE-2/index.html. 1303

PPCMBE = Penn Parsed Corpus of Modern British English. 2010. Compiled by Anthony

Kroch, Beatrice Santorini, and Ariel Diertani. University of Pennsylvania.

http://www.ling.upenn.edu/hist-corpora/PPCMBE-RELEASE-1/index.html.

PPCME2 = Penn-Helsinki Parsed Corpus of Middle English, 2nd edn. 2000. Compiled by

Anthony Kroch and Ann Taylor. University of Pennsylvania. http://www.ling.upenn.edu/hist-

corpora/PPCME2-RELEASE-3/index.html.

PT = Corpus of Historical Science Writing (The Philosophical Transactions of the Royal

Society). 2010–11. Compiled by Bethany Gray. Northern Arizona University.

S = The Electronic Sawyer. An Online Catalogue of Anglo-Saxon Charters. 2011. http://www.esawyer.org.uk/.

SAVE = The South Asian Varieties of English Corpus. 2011. Compiled by Joybrato

Mukherjee, Tobias Bernaisch, Christopher Koch, and Marco Schilk. University of Giessen. http://www.uni-giessen.de/cms/faculties/f05/engl/ling/research/save.

SCOTS = Scottish Corpus of Texts and Speech. 2007. Compiled by Professor John Corbett

(Principal Investigator), Dr. Wendy Anderson (Research Assistant 2004–7), Dr. Fiona

Douglas (Research Assistant 2001–3), Dave Beavan (Computing Manager), Professor

Christian Kay, Jean Anderson, Dr. Jane Stuart-Smith, Louise Sweeney, Cerwyss O’Hare, and

Flora Edmonds. Department of English Language, University of Glasgow. http://www.scottishcorpus.ac.uk.

The Statesman. 2011. http://www.thestatesman.net/.

TEAMS = The Consortium for the Teaching of the Middle Ages. http://www.lib.rochester.edu/camelot/teams/tmsmenu.htm. 1304

TIME = TIME Magazine Corpus. 2007– . Compiled by Mark Davies. Brigham Young

University. http://corpus.byu.edu/time.

WebCorp = The Web as Corpus. Created, operated, and maintained by the Research and

Development Unit for English Studies, School of English, Birmingham City University. http://www.webcorp.org.uk.

YCOE = The York-Toronto-Helsinki Parsed Corpus of Old English Prose. 2003. Compiled by Ann Taylor, Anthony Warner, Susan Pintzuk, and Frank Beths. Department of Language and Linguistic Science, University of York. http://www.helsinki.fi/varieng/CoRD/corpora/YCOE/.

ZEN = Zurich English Newspaper Corpus, version 1.0. 2004. Compiled by Udo Fries, Hans

Martin Lehmann, Beni Ruef, Peter Schneider, Patrick Studer, Caren auf dem Keller, Beat

Nietlispach, Sandra Engler, Sabine Hensel, and Franziska Zeller. English Department,

University of Zurich. http://es-zen.unizh.ch.