List of Corpora and Databases
1296
List of corpora and databases
Links accessed in April 2012.
A
ARCHER = A Representative Corpus of Historical English Registers, version 3.1. 1990–93,
2002, 2007, 2010. Compiled under the supervision of Douglas Biber and Edward Finegan at
Northern Arizona University, University of Southern California, University of Freiburg,
University of Heidelberg, University of Helsinki, Uppsala University, University of
Michigan, University of Manchester, Lancaster University, University of Bamberg,
University of Zurich, University of Trier, University of Salford, and University of Santiago de Compostela. http://www.llc.manchester.ac.uk/research/projects/archer/.
B
B-Brown = B-Brown Corpus. In progress. English Department, University of Zurich. http://www.es.uzh.ch/Subsites/Projects/BBROWN.html.
BE06 = The British English 2006 corpus. 2008. Compiled by Paul Baker. Lancaster
University. http://www.helsinki.fi/varieng/CoRD/corpora/BE06/index.html.
BLOB-1901 = Lancaster-1901 Corpus. In progress. Compiled by Nick Smith, Paul Rayson, and Geoffrey Leech. Lancaster University.
BLOB-1931 = Lancaster-1931 Corpus. 2003–6. Compiled by Geoffrey Leech, Paul Rayson, and Nick Smith. Lancaster University. http://www.helsinki.fi/varieng/CoRD/corpora/BLOB-
1931/.
BoE = Bank of English Corpus (Cobuild Corpus). Distributed by Collins WordBanks
Online. http://collinslanguage.com/content-solutions/wordbanks.
BNC = The British National Corpus, version 3 (BNC XML Edition). 2007. Distributed by
Oxford University Computing Services on behalf of the BNC Consortium. http://www.natcorp.ox.ac.uk/. 1297
Brown = A Standard Corpus of Present-Day Edited American English, for use with Digital
Computers. 1964, 1971, 1979. Compiled by W. Nelson Francis and Henry Kuþera. Brown
University. http://www.helsinki.fi/varieng/CoRD/corpora/BROWN/.
BYU-BNC = BYU-BNC: The British National Corpus. 2004– . Interface by Mark Davies. http://corpus.byu.edu/bnc/.
C
CED = A Corpus of English Dialogues 1560–1760. 2006. Compiled under the supervision of
Merja Kytö (Uppsala University) and Jonathan Culpeper (Lancaster University). http://www.helsinki.fi/varieng/CoRD/corpora/CED/index.html.
CEEC = Corpus of Early English Correspondence. 1998. Compiled by Terttu Nevalainen,
Helena Raumolin-Brunberg, Jukka Keränen, Minna Nevala, Arja Nurmi, and Minna
Palander-Collin. Department of English, University of Helsinki. http://www.helsinki.fi/varieng/CoRD/corpora/CEEC/index.html.
CEECS = Corpus of Early English Correspondence Sampler. 1998. Compiled by Jukka
Keränen, Minna Nevala, Terttu Nevalainen, Arja Nurmi, Minna Palander-Collin, and Helena
Raumolin-Brunberg. Department of English, University of Helsinki. http://www.helsinki.fi/varieng/CoRD/corpora/CEEC/ceecs.html.
CEEM = Corpus of Early English Medical Writing. In progress. Compiled under the supervision of Irma Taavitsainen and Päivi Pahta. University of Helsinki. See MEMT and
EMEMT.
CHF = Corpus of Historical Fiction. 2010. Compiled by Bethany Gray. Northern Arizona
University.
CIE = A Corpus of Irish English. 2003. Compiled by Raymond Hickey. University of
Duisburg-Essen. http://www.uni-due.de/CP/CIE.htm. 1298
CLMETEV = The Corpus of Late Modern English Texts (Extended Version). 2006.
Compiled by Hendrik De Smet. Department of Linguistics, University of Leuven. http://www.helsinki.fi/varieng/CoRD/corpora/CLMETEV/.
CMSW = Corpus of Modern Scottish Writing. In progress. Principal investigator: John
Corbett. University of Glasgow. http://www.scottishcorpus.ac.uk/cmsw.
COCA = The Corpus of Contemporary American English. 2008– . Compiled by Mark
Davies. Brigham Young University. http://corpus.byu.edu/coca/.
COERP = Corpus of English Religious Prose. In progress. Compiled by Thomas Kohnen,
Tanja Rütten, Ingvilt Marcoe, Kirsten Gather, and Dorothee Groeger. University of Cologne. http://www.helsinki.fi/varieng/CoRD/corpora/COERP/.
COHA = Corpus of Historical American English. 2010– . Compiled by Mark Davies.
Brigham Young University. http://corpus.byu.edu/coha/.
CONCE = A Corpus of Nineteenth-Century English. 2000. Compiled by Merja Kytö
(Uppsala University) and Juhani Rudanko (University of Tampere).
COOEE = Corpus of Oz Early English. 2004. Compiled by Clemens Fritz. Free University of
Berlin. http://www.helsinki.fi/varieng/CoRD/corpora/COOEE/.
CoRD = Corpus Resource Database. 2007– . http://www.helsinki.fi/varieng/CoRD/index.html.
CSC = Corpus of Scottish Correspondence, 1500–1715. 2007. Compiled by Anneli
Meurman-Solin. University of Helsinki. http://www.helsinki.fi/varieng/CoRD/corpora/CSC/index.html.
D
DCPSE = Diachronic Corpus of Present-Day Spoken English. 2002–4. Compiled under the supervision of Bas Aarts. Survey of English Usage, University College London. http://www.ucl.ac.uk/english-usage/projects/dcpse/. 1299
DECTE = Diachronic Electronic Corpus of Tyneside English. In progress. Compiled under the supervision of Karen P. Corrigan. Newcastle University. http://research.ncl.ac.uk/decte/.
DOEC = Dictionary of Old English Corpus. Original release 1981 compiled by Angus
Cameron, Ashley Crandell Amos, Sharon Butler, and Antonette diPaolo Healey. Release
2009 compiled by Antonette diPaolo Healey, Joan Holland, Ian McDougall, and David
McDougall, with Xin Xiang. University of Toronto. http://www.helsinki.fi/varieng/CoRD/corpora/DOEC/index.html.
E
ECCO = Eighteenth-Century Collections Online. http://gale.cengage.co.uk/product- highlights/history/eighteenth-century-collections-online.aspx.
EEBO = Early English Books Online. http://eebo.chadwyck.com/home.
EMC = Corpus of Early Medieval Coin Finds and Sylloge of Coins of the British Isles databases (Fitzwilliam Museum, Cambridge). http://www.fitzmuseum.cam.ac.uk/coins/emc/.
EMEMT = Early Modern English Medical Texts. 2010. Compiled by Irma Taavitsainen
(University of Helsinki), Päivi Pahta (University of Tampere), Martti Mäkinen (Svenska handelshögskolan), Turo Hiltunen, Ville Marttila, Maura Ratia, Carla Suhr, and Jukka
Tyrkkö (University of Helsinki). http://www.helsinki.fi/varieng/CoRD/corpora/CEEM/EMEMTindex.html.
ESTC = English Short Title Catalogue. http://estc.bl.uk. eWAVE = The electronic World Atlas of Varieties of English. 2011. Edited by Bernd
Kortmann and Kerstin Lunkenheimer. Leipzig: Max Planck Institute for Evolutionary
Anthropology. http://www.ewave-atlas.org/.
F
FLOB/F-LOB = The Freiburg–LOB Corpus of British English. Original release 1999 compiled by Christian Mair (Albert-Ludwigs-Universität Freiburg). Release 2007 compiled 1300 by Christian Mair (Albert Ludwigs-Universität Freiburg) and Geoffrey Leech (University of
Lancaster). http://www.helsinki.fi/varieng/CoRD/corpora/FLOB/.
Frown = The Freiburg-Brown Corpus. Original release 1999 compiled by Christian Mair
(Albert-Ludwigs-Universität Freiburg). Release 2007 compiled by Christian Mair (Albert
Ludwigs-Universität Freiburg) and Geoffrey Leech (University of Lancaster). http://www.helsinki.fi/varieng/CoRD/corpora/FROWN/.
G
Google Books (American English) Corpus. 2011– . Compiled by Mark Davies. Brigham
Young University. http://googlebooks.byu.edu/.
The Gutenberg Archive. 2011. http://www.gutenberg.org/.
H
HC = Helsinki Corpus of English Texts. 1991. Compiled by Matti Rissanen (Project leader),
Merja Kytö (Project secretary); Leena Kahlas-Tarkka, Matti Kilpiö (Old English); Saara
Nevanlinna, Irma Taavitsainen (Middle English); Terttu Nevalainen, Helena Raumolin-
Brunberg (Early Modern English). Department of English, University of Helsinki. http://www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus/index.html.
HCOS = Helsinki Corpus of Older Scots. 1995. Compiled by Anneli Meurman-Solin.
Department of English, University of Helsinki. http://www.helsinki.fi/varieng/CoRD/corpora/HCOS/.
I
ICAME = International Computer Archive of Modern and Medieval English. http://icame.uib.no/.
ICE = The International Corpus of English, version 2. 2006. Coordinated by Gerald Nelson
(University of Hong Kong). http://ice-corpora.net/ice/index.htm.
L 1301
LAEME = A Linguistic Atlas of Early Middle English, 1150–1325. 2007. Compiled by
Margaret Laing and Roger Lass. University of Edinburgh. http://www.lel.ed.ac.uk/ihd/laeme1/laeme1.html.
LAMSAS = Linguistic Atlas of the Middle and South Atlantic States. http://us.english.uga.edu/lamsas/.
LAOS = A Linguistic Atlas of Older Scots, Phase 1: 1380–1500. 2008. Compiled by Keith
Williamson. University of Edinburgh. http://www.lel.ed.ac.uk/ihd/laos1/laos1.html.
Lion = Literature Online. http://lion.chadwyck.co.uk/.
LOB = The Lancaster-Oslo/Bergen Corpus, original version. 1970–78. Compiled by
Geoffrey Leech (Lancaster University), Stig Johansson (University of Oslo), and Knut
Hofland (University of Bergen). http://www.helsinki.fi/varieng/CoRD/corpora/LOB/.
London Lives: 1690–1800. Crime, Poverty and Social Policy in the Metropolis. Version 1.0.
1 September 2010. http://www.londonlives.org/.
LSWE = The Longman Spoken and Written English Corpus. http://www.pearsonlongman.com/dictionaries/corpus/.
M
MEG-C= The Middle English Grammar Corpus, version 2011.1. Compiled by Merja
Stenroos, Martti Mäkinen, Simon Horobin, and Jeremy J. Smith. University of Stavanger. http://www.uis.no/research/culture/the_middle_english_grammar_project/meg-c/.
MEMT = Middle English Medical Texts. 2005. Compiled by Irma Taavitsainen (University of Helsinki), Päivi Pahta (University of Tampere), and Martti Mäkinen (University of
Stavanger). http://www.helsinki.fi/varieng/CoRD/corpora/CEEM/MEMTindex.html.
N 1302
NECTE = The Newcastle Electronic Corpus of Tyneside English. 2005. Compiled by Karen
Corrigan (Newcastle University), Hermann Moisl (Newcastle University), and Joan Beal
(University of Sheffield). http://www.helsinki.fi/varieng/CoRD/corpora/NECTE/.
NECTE2 = The Newcastle Electronic Corpus of Tyneside English 2. In progress. Compiled
by Karen Corrigan. Newcastle University. http://www.research.ncl.ac.uk/necte2/.
NEET = Network of Eighteenth-century English Texts. 2007. Compiled by Susan
Fitzmaurice. University of Sheffield. http://sites.google.com/site/helontheweb/corpora.
NYT = Corpus of Historical Newspaper Writing (New York Times). 2010–11. Compiled by
Bethany Gray. Northern Arizona University.
O
OBC = Old Bailey Corpus. In progress. Compiled under the supervision of Magnus Huber.
University of Giessen. http://www.uni-giessen.de/oldbaileycorpus/index.php.
Old Bailey Online. Version 6.0. March 2011. http://www.oldbaileyonline.org/. ONZE = Origins of New Zealand English Corpus. In progress. Compiled by the ONZE project team. University of Canterbury. http://www.lacl.canterbury.ac.nz/onze/index.html.
P
PASE = Prosopography of Anglo-Saxon England. 2010. http://www.pase.ac.uk/index.html/.
PCEEC = The Parsed Corpus of Early English Correspondence. 2006. Annotated by Ann
Taylor, Arja Nurmi, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. University of York and University of Helsinki. Distributed through the Oxford Text Archive. http://www.helsinki.fi/varieng/CoRD/corpora/CEEC/pceec.html.
PPCEME = Penn-Helsinki Parsed Corpus of Early Modern English. 2004. Compiled by
Anthony Kroch, Beatrice Santorini, and Ariel Diertani. University of Pennsylvania.
http://www.ling.upenn.edu/hist-corpora/PPCEME-RELEASE-2/index.html. 1303
PPCMBE = Penn Parsed Corpus of Modern British English. 2010. Compiled by Anthony
Kroch, Beatrice Santorini, and Ariel Diertani. University of Pennsylvania.
http://www.ling.upenn.edu/hist-corpora/PPCMBE-RELEASE-1/index.html.
PPCME2 = Penn-Helsinki Parsed Corpus of Middle English, 2nd edn. 2000. Compiled by
Anthony Kroch and Ann Taylor. University of Pennsylvania. http://www.ling.upenn.edu/hist-
corpora/PPCME2-RELEASE-3/index.html.
PT = Corpus of Historical Science Writing (The Philosophical Transactions of the Royal
Society). 2010–11. Compiled by Bethany Gray. Northern Arizona University.
S
S = The Electronic Sawyer. An Online Catalogue of Anglo-Saxon Charters. 2011. http://www.esawyer.org.uk/.
SAVE = The South Asian Varieties of English Corpus. 2011. Compiled by Joybrato
Mukherjee, Tobias Bernaisch, Christopher Koch, and Marco Schilk. University of Giessen. http://www.uni-giessen.de/cms/faculties/f05/engl/ling/research/save.
SCOTS = Scottish Corpus of Texts and Speech. 2007. Compiled by Professor John Corbett
(Principal Investigator), Dr. Wendy Anderson (Research Assistant 2004–7), Dr. Fiona
Douglas (Research Assistant 2001–3), Dave Beavan (Computing Manager), Professor
Christian Kay, Jean Anderson, Dr. Jane Stuart-Smith, Louise Sweeney, Cerwyss O’Hare, and
Flora Edmonds. Department of English Language, University of Glasgow. http://www.scottishcorpus.ac.uk.
The Statesman. 2011. http://www.thestatesman.net/.
T
TEAMS = The Consortium for the Teaching of the Middle Ages. http://www.lib.rochester.edu/camelot/teams/tmsmenu.htm. 1304
TIME = TIME Magazine Corpus. 2007– . Compiled by Mark Davies. Brigham Young
University. http://corpus.byu.edu/time.
W
WebCorp = The Web as Corpus. Created, operated, and maintained by the Research and
Development Unit for English Studies, School of English, Birmingham City University. http://www.webcorp.org.uk.
Y
YCOE = The York-Toronto-Helsinki Parsed Corpus of Old English Prose. 2003. Compiled by Ann Taylor, Anthony Warner, Susan Pintzuk, and Frank Beths. Department of Language and Linguistic Science, University of York. http://www.helsinki.fi/varieng/CoRD/corpora/YCOE/.
Z
ZEN = Zurich English Newspaper Corpus, version 1.0. 2004. Compiled by Udo Fries, Hans
Martin Lehmann, Beni Ruef, Peter Schneider, Patrick Studer, Caren auf dem Keller, Beat
Nietlispach, Sandra Engler, Sabine Hensel, and Franziska Zeller. English Department,
University of Zurich. http://es-zen.unizh.ch.